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Improved targeted DNA insertion in plants. 
Field of the invention 

The current invention relates to the field of molecular plant biology, more specific to the 
field of plant genome engineering. Methods are provided for the directed introduction of a 
foreign DNA fragment at a preselected insertion site in the genome of a plantHPlants 
containing the foreign DNA inserted at a particular site can now be obtained at a higher 
frequency and with greater accuracy than is possible with the currently available targeted 
DNA insertion methods. Moreover, in a large proportion of the resulting plants, the foreign 
DNA has only been inserted at the preselected insertion site, without the foreign DNA also 
having been inserted randomly at other locations in the plant's genome. The methods of the 
invention are thus an improvement, both quantitatively and qualitatively, over the prior art 
methods. Also provided are chimeric genes, plasmids, vectors and other means to be used in 
the methods of the invention. 

Background art 

The first generation of transgenic plants in the early 80' s of last century by Agrobacterium 
mediated transformation technology, has spurred the development of other methods to 
introduce a foreign DNA of interest or a transgene into the genome of a plant, such as PEG 
mediated DNA uptake in protoplast, microprojectile bombardment, silicon whisker mediated 
transformation etc. 



All the plant transformation methods, however, have in common that the transgenes 
incorporated in the plant genome are integrated in a random fashion and in unpredictable 
copy number. Frequently, the transgenes can be integrated in the form of repeats, either of 
the whole transgene or of parts thereof. Such a complex integration pattern may influence the 
expression level of the transgenes, e.g. by destruction of the transcribed RNA through 
posttranscriptional gene silencing mechanisms or by inducing methylation of the introduced 
DNA, thereby downregulating the transcriptional activity on the transgene. Also, the 
integration site per se can influence the level of expression of the transgene. The 
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combination of these factors results in a wide variation in the level of expression of the 
transgenes or foreign DNA of interest among different transgenic plant cell and plant lines. 
Moreover, the integration of the foreign DNA of interest may have a disruptive effect on the 
region of the genome where the integration occurs, and can influence or disturb the normal 
function of that target region, thereby leading to, often undesirable, side-effects. 

Therefore, whenever the effect of introduction of a particular foreign DNA into a plant is 
investigated, it is required that a large number of transgenic plant lines are generated and 
analysed in order to obtain significant results. Likewise, in the generation of transgenic crop 
plants, where a particular DNA of interest is introduced in plants to provide the transgenic 
plant with a desired, known phenotype, a large population of independently created 
transgenic plant lines or so-called events is created, to allow the selection of those plant lines 
with optimal expression of the transgenes, and with minimal, or no, side-effects on the 
overall phenotype of the transgenic plant. Particularly in this field, it would be advantageous 
if this trial-and-error process could be replaced by a more directed approach, in view of the 
burdensome regulatory requirements and high costs associated with the repeated field trials 
required for the elimination of the unwanted transgenic events. Furthermore, it will be clear 
that the possibility of targeted DNA insertion would also be beneficial in the process of so- 
called transgene stacking. 

The need to control transgene integration in plants has been recognized early on, and several 
methods have been developed in an effort to meet this need (for a review see Kumar and 
Fladung, 2001, Trends in Plant Science, 6, ppl55-159). These methods mostly rely on 
homologous recombination-based transgene integration, a strategy which has been 
successfully applied in prokaryotes and lower eukaryotes (see e.g. EP03 17509 or the 
corresponding publication by Paszkowski et al, 1988, EMBOJ., 7, pp4021-4026). However, 
for plants, the predominant mechanism for transgene integration is based on illegitimate 
recombination which involves little homology between the recombming DNA strands. A 
major challenge in this area is therefore the detection of the rare homologous recombination 
events, which are masked by the far more efficient integration of the introduced foreign 
DNA via illegitimate recombination. 
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One way of solving this problem is by selecting against the integration events that have 
occurred by illegitimate recombination, such as exemplified in W094/17176. 

Another way of solving the problem is by activation of the target locus and/or repair or donor 
DNA through the induction of double stranded DNA breaks via rare-cutting endonucleases, 
such as I-Scel. This technique has been shown to increase the frequency of homologous 
recombination by at least two orders of magnitude using Agrobacteria to deliver the repair 
DNA to the plant cells (Puchta et aL 9 1996, Proc. Natl. Acad Scl U.S. A, 93, pp5055-5060; 
Chilton and Que, Plant Physiol, 2003 ). 

WO96/14408 describes an isolated DNA encoding the enzyme I-Scel. This DNA sequence 
can be incorporated in cloning and expression vectors, transformed cell lines and transgenic 
animals. The vectors are useful in gene mapping and site-directed insertion of genes. 

WO00/46386 describes methods of modifying, repairing, attenuating and inactivating a gene 
or other chromosomal DNA in a cell through I-Scel double strand break. Also disclosed are 
methods of treating or prophylaxis of a genetic disease in an individual in need thereof. 
Further disclosed are chimeric restriction endonucleases. 

However, there still remains a need for improving the frequency of targeted insertion of a 
foreign DNA in the genome of a eukaryotic cell, particularly in the genome of a plant cell. 
These and other problems are solved as described hereinafter in the different detailed 
embodiments of the invention, as well as in the claims. 

Summary of the invention 

In one embodiment, the invention provides a method for introducing a foreign DNA of 
interest, which may be flanked by a DNA region having at least 80% sequence identity to a 
DNA region flanking a preselected site, into a preselected site, such as an I-Scel site of a 
genome of a plant cell, such as a maize cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell, e.g by introducing an I-Scel encoding gene; 
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(b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that the foreign DNA is delivered by direct DNA transfer which may be 
accomplished by bombardment of microprojectiles coated with the foreign DNA of interest. 
The I-Scel encoding gene can comprise a nucleotide sequence encoding the amino acid 
sequence of SEQ ID No 1, wherein said nucleotide sequence has a GC content of about 50% 
to about 60%, provided that 

i) the nucleotide sequence does not comprise a nucleotide sequence selected from the 
group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATT AAA, AATTAA, AATACA and CATAAA; 

ii) the nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

iii) the nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

iv) the nucleotide sequence does not comprise a GC stretch consisting of 7 consecutive 
nucleotides selected from the group of G or C; 

v) the nucleotide sequence does not comprise a AT stretch consisting of 5 consecutive 
nucleotides selected from the group of A or T; and 

vi) the nucleotide sequence does not comprise the codons TTA, CTA, ATA, GTA, TCG, 
CCG, ACG and GCG. An example of such an I-Scel encoding gene comprises the 
nucleotide sequence of SEQ ID 4. 

The plant cell may be incubated in a plant phenolic compound prior to step a). 



In another embodiment, the invention relates to a method for introducing a foreign DNA of 
interest into a preselected site of a genome of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell; 

. (b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that the double stranded DNA break is introduced by a rare cutting 
endonuclease encoded by a nucleotide sequence wherein said nucleotide sequence has a 
GC content of about 50% to about 60%, provided that 
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i) the nucleotide sequence does not comprise a nucleotide sequence selected from 
the group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 

. ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATTAAA, AATTAA, AATACA and CATAAA; 

ii) the nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

iii) the nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

iv) the nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; 

v) the nucleotide sequence does not comprise a AT stretch consisting of 5 
consecutive nucleotides selected from the group of A or T; and 

vi) the nucleotide sequence does not comprise the codons TTA, CTA, ATA, GTA, 
TCG, CCG, ACG and GCG. 

In yet another embodiment, the invention relates to a method for introducing a foreign DNA 
of interest into a preselected site of a genome of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell; 

(b) introducing the foreign DNA of interest into the plant cell ; 

characterized in that prior to step a, the plant cells are incubated in a plant phenolic 
compound which may be selected from the group of acetosyringone (3,5-dimethoxy-4- 
hydroxyacetophenone), a-hydroxy-acetosyringone, sinapinic acid (3,5 dimethoxy-4- 
hydroxycinnamic acid), syringic acid (4-hydroxy-3,5 dimethoxybenzoic acid), ferulic acid 
(4-hydroxy-3-methoxycinnamic acid), catechol (1,2-dihydroxybenzene), p-hydroxybenzoic 
acid (4-hydroxybenzoic acid), p-resorcylic acid (2,4 dihydroxybenzoic acid), protocatechuic 
acid (3,4-dihydroxybenzoic acid), pyrrogallic acid (2,3,4 -trihydroxybenzoic acid), gallic 
acid (3,4,5-trihydroxybenzoic acid) and vanillin (3-methoxy-4-hydroxybenzaldehyde). 
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The invention also provides an isolated DNA fragment comprising a nucleotide sequence 
encoding the amino acid sequence of SEQ ID No 1, wherein the nucleotide sequence has a 
GC content of about 50% to about 60%, provided that 

i) * the nucleotide sequence does not comprise a nucleotide sequence selected from 
' the group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 

AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATT AAA, AATTAA, AATACA and CATAAA; 

ii) the nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

iii) the nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

iv) the nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; 

v) the nucleotide sequence does not comprise a AT stretch consisting of 5 
consecutive nucleotides selected from the group of A or T; and 

vi) codons of said nucleotide sequence coding for leucine (Leu), isoleucine (lie), 
valine (Val), serine (Ser), proline (Pro), threonine (Thr), alanine (Ala) do not 
comprise TA or GC duplets in positions 2 and 3 of said codons, 

The invention also provides an isolated DNA sequence comprising the nucleotide sequence 
of SEQ ID No 4, as well as chimeric gene comprising the isolated DNA fragment according 
to the invention operably linked to a plant-expressible promoter and the use of such a 
chimeric gene to insert a foreign DNA into an I-Scel recognition site in the genome of a 
plant. 
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In yet another embodiment of the invention, a method is provided for introducing a foreign 
DNA of interest into a preselected site of a genome of a plant cell comprising the steps of 

a) inducing a double stranded DNA break at the preselected site in the genome of the cell 
by a rare cutting endonuclease 

b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that said endonuclease comprises a nuclear localization signal. 

Brief description of the figures 

Table 1 represents the possible trinucleotide (codon) choices for a synthetic I-Scel coding 
region (see also the nucleotide sequence in SEQ ID No 2). 

Table 2 represents preferred possible trinucleotide choices for a synthetic I-Scel coding 
region (see also the nucleotide sequence in SEQ ID No 3). 

Figure 1 : Schematic representation of the target locus (A) and the repair DNA (B) used in the 
assay for homologous recombination mediated targeted DNA insertion. The target locus after 
recombination is also represented (C). DSB site: double stranded DNA break site; 
3'g7:transcription termination and polyadenylation signal of A. tumefaciens gene 7; neo: 
plant expressible neomycin phosphotransferase; 35S: promoter of the CaMV 35S transcript; 
5' bar : DNA region encoding the amino terminal portion of the phosphinotricin 
acetyltransferase; 3'nos: transcription termination and polyadenylation signal of A. 
tumefaciens nopaline synthetase gene; Pnos: promoter of the nopaline synthetase gene of A. 
tumefaciens; 3'ocs: 3' transcription termination and polyadenylation signal of the octopine 
synthetase gene of A. tumefaciens. 

Detailed description 

The current invention is based on the following findings: 
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a) Introduction into the plant cells of the foreign DNA to be inserted via direct DNA 
transfer, particularly microprojectile bombardment, unexpectedly increased the 
frequency of targeted insertion events. All of the obtained insertion events were targeted 
DNA insertion events, which occurred at the site of the induced double stranded DNA 
break. Moreover all of these targeted insertion events appeared to be exact recombination 
events between the provided sequence homology flanking the double stranded DNA 
break. Only about half of these events had an additional insertion of the foreign DNA at a 
site different from the site of the induced double stranded DNA break. 

b) Induction of the double stranded DNA break by transient expression of a rare-cutting 
double stranded break inducing endonuclease, such as I-Scel, encoded by chimeric gene 
comprising a synthetic coding region for a rare-cutting endonuclease such as I-Scel 
designed according to a preselected set of rules surprisingly increased the quality of the 
resulting targeted DNA insertion events (i.e. the frequency of perfectly targeted DNA 
insertion events). Furthermore, the endonuclease had been equipped with a nuclear 
localization signal. 

c) Preincubation of the target cells in a plant phenolic compound, such as acetosyringone, 
further increased the frequency of targeted insertion at double stranded DNA breaks 
induced in the genome of a plant cell. 

Any of the above findings, either alone or in combination, improves the frequency with 
which homologous recombination based targeted insertion events can be obtained, as well as 
the quality of the recovered events. 

Thus, in one aspect, the invention relates to a method for introducing a foreign DNA of 
interest into a preselected site of a genome of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell ; 

(b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that the foreign DNA is delivered by direct DNA transfer. 

As used herein "direct DNA transfer" is any method of DNA introduction into plant cells 
which does not involve the use of natural Agrobacterium spp. which is capable of 
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introducing DNA into plant cells. This includes methods well known in the art such as 
introduction of DNA by electroporation into protoplasts, introduction of DNA by 
electroporation into intact plant cells or partially degraded tissues or plant cells, introduction 
of DNA through the action of agents such as PEG and the like, into protoplasts, and 
particularly bombardment with DNA coated microprojectiles. Introduction of DNA by direct 
transfer into plant cells differs from Agrobacterium-mediated DNA introduction at least in 
that double stranded DNA enters the plant cell, in that the entering DNA is not coated with 
any protein, and in that the amount of DNA entering the plant cell may be considerably 
greater. Furthermore, DNA introduced by direct transfer methods, such as the introduced 
chimeric gene encoding a double stranded DNA break inducing endonuclease, may be more 
amenable to transcription, resulting in a better timing of the induction of the double stranded 
DNA break. Although not intending to limit the invention to a particular mode of action, it is 
thought that the efficient homology-recombination-based insertion of repair DNA or foreign 
DNA in the genome of a plant cell may be due to a combination of any of these parameters. 

Conveniently, the double stranded DNA break may be induced at the preselected site by 
transient expression after introduction of a plant-expressible gene encoding a rare cleaving 
double stranded break inducing enzyme. As set forth elsewhere in this document, I-Scel may 
be used for that purpose to introduce a foreign DNA at an I-Scel recognition site. However, 
it will be immediately clear to the person skilled in the art that also other double stranded 
break inducing enzymes can be used to insert the foreign DNA at their respective recognition 
sites. A list of rare cleaving DSB inducing enzymes and their respective recognition sites is 
provided in Table I of WO 03/004659 (pages 17 to 20) (incorporated herein by reference). 
Furthermore, methods are available to design custom-tailored rare-cleaving endonucleases 
that recognize basically any target nucleotide sequence of choice. Such methods have been 
described e.g. in WO 03/080809, W094/18313 or WO95/09233 and in Isalan et aL, 2001, 
Nature Biotechnology 19, 656- 660; Liu et al 1997, Proc. Natl Acad Sci. USA 94, 5525- 
5530.) 

Thus, as used herein "a preselected site" indicates a particular nucleotide sequence in the 
plant nuclear genome at which location it is desired to insert the foreign DNA. A person 
skilled in the art would be perfectly able to either choose a double stranded DNA break 
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inducing ("DSBP) enzyme recognizing the selected target nucleotide sequence or engineer 
such a DSBI endonuclease. Alternatively, a DSBI endonuclease recognition site may be 
introduced into the plant genome using any conventional transformation method or by 
conventional breeding using a plant line having a DSBI endonuclease recognition site in its 
genome, and any desired foreign DNA may afterwards be introduced into that previously 
introduced preselected target site. 

The double stranded DNA break may be induced conveniently by transient introduction of a 
plant-expressible chimeric gene comprising a plant-expressible promoter region operably 
linked to a DNA region encoding a double stranded break inducing enzyme. The DNA 
region encoding a double stranded break inducing enzyme may be a synthetic DNA region, 
such as but not limited to, a synthetic DNA region whereby the codons are chosen according 
to the design scheme as described elsewhere in this application for I-Scel encoding regions. 

The double stranded break inducing enzyme may comprise, but need not comprise, a nuclear 
localization signal (NLS) [Raikhel, Plant Physiol. 100: 1627-1632 (1992) and references 
therein], such as the NLS of SV40 large T-antigen [Kalderon et al. Cell 39: 499-509 (1984)]. 
The nuclear localization signal may be located anywhere in the protein, but is conveniently 
located at the N-terminal end of the protein. The nuclear localization signal may replace one 
or more of the amino acids of the double stranded break inducing enzyme. 

As used herein "foreign DNA of interest" indicates any DNA fragment which one may want 
to introduce at the preselected site. Although it is not strictly required, the foreign DNA of 
interest may be flanked by at least one nucleotide sequence region having homology to a 
DNA region flanking the preselected site. The foreign DNA of interest may be flanked at 
both sites by DNA regions having homology to both DNA regions flanking the preselected 
site. Thus the repair DNA molecule(s) introduced into the plant cell may comprise a foreign 
DNA flanked by one or two flanking sequences having homology to the DNA regions 
respectively upstream or downstream the preselected site. This allows to better control the 
insertion of the foreign DNA. Indeed, integration by homologous recombination will allow 
precise joining of the foreign DNA fragment to the plant nuclear genome up to the 
nucleotide level. 
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The flanking nucleotide sequences may vary in length, and should be at least about 10 
nucleotides in length. However, the flanking region may be as long as is practically possible 
(e.g. up to about 100-150 kb such as complete bacterial artificial chromosomes (BACs)). 
Preferably, the flanking region will be about 50 bp to about 2000 bp. Moreover, the regions 
flanking the foreign DNA of interest need not be identical to the DNA regions flanking the 
preselected site and may have between about 80% to about 100% sequence identity, 
preferably about 95% to about 100% sequence identity with the DNA regions flanking the 
preselected site. The longer the flanking region, the less stringent the requirement for 
homology. Furthermore, it is preferred that the sequence identity is as high as practically 
possible in the vicinity of the location of exact insertion of the foreign DNA. 

Moreover, the regions flanking the foreign DNA of interest need not have homology to the 
regions immediately flanking the preselected site, but may have homology to a DNA region 
of the nuclear genome further remote from that preselected site. Insertion of the foreign 
DNA will then result in a removal of the target DNA between the preselected insertion site 
and the DNA region of homology. In other words, the target DNA located between the 
homology regions will be substituted for the foreign DNA of interest. 

For the purpose of this invention, the "sequence identity" of two related nucleotide or amino 
acid sequences, expressed as a percentage, refers to the number of positions in the two 
optimally aligned sequences which have identical residues (xlOO) divided by the number of 
positions compared. A gap, i.e. a position in an alignment where a residue is present in one 
sequence but not in the other, is regarded as a position with non-identical residues. The 
alignment of the two sequences is performed by the Needleman and Wunsch algorithm 
(Needleman and Wunsch 1970) Computer-assisted sequence alignment, can be conveniently 
performed using standard software program such as GAP which is part of the Wisconsin 
Package Version 10.1 (Genetics Computer Group, Madison, Wisconsin, USA) using the 
default scoring matrix with a gap creation penalty of 50 and a gap extension penalty of 3. 

In another aspect, the invention relates to a modified I-Scel encoding DNA fragment, and the 
use thereof to efficiently introduce a foreign DNA of interest into a preselected site of a 
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genome of a plant cell, whereby the modified I-Scel encoding DNA fragment has a 
nucleotide sequence which has been designed to fulfill the following criteria: 

a) the nucleotide sequence encodes a functional I-Scel endonuclease, such as an I- 
Scel endonuclease having the amino acid sequence as provided in SEQ ID No 1 . 

b) the nucleotide sequence has a GC content of about 50% to about 60% 

c) the nucleotide sequence does not comprise a nucleotide sequence selected from 
the group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATTAAA, AATTAA, AATACA and CATAAA; 

d) the nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

e) the nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

f) the nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; 

g) the nucleotide sequence does not comprise a GC stretch consisting of 5 
consecutive nucleotides selected from the group of A or T; and 

h) the nucleotide sequence does not comprise codons coding for Leu, He, Val, Ser, 
Pro, Thr, Ala that comprise TA or CG duplets in positions 2 and 3 (i.e. the 
nucleotide sequence does not comprise the codons TTA, CTA, ATA, GTA, TCG, 
CCG, ACG and GCG). 

I-Scel is a site-specific endonuclease, responsible for intron mobility in mitochondria in 
Saccharomyces cerevisea. The enzyme is encoded by the optional intron Sc LSU.l of the 
21 S rRNA gene and initiates a double stranded DNA break at the intron insertion site 
generating a 4 bp staggered cut with 3 'OH overhangs. The recognition site of I-Scel 
endonuclease extends over an 18 bp non-symmetrical sequence (Colleaux et al. 1988 Proc. 
Natl Acad. Set USA 85: 6022-6026). The amino acid sequence for I-Scel and a universal 
code equivalent of the mitochondrial I-Scel gene have been provided by e.g. WO 96/14408. 

WO 96/14408 discloses that the following variants of I-Scel protein are still functional: 
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• positions 1 to 10 can be deleted 

• position 36: Gly (G) is tolerated 

• position 40: Met (M) or Val (V) are tolerated 

• position 41 : Ser (S) or Asn (N) are tolerated 

• position 43: Ala (A) is tolerated 

• position 46: Val (V) or N (Asn) are tolerated 

• position 9 1 : Ala (A) is tolerated 

• positions 123 and 156: Leu (L) is tolerated 

• position 223 : Ala (A) and Ser (S) are tolerated 

and synthetic nucleotide sequences encoding such variant I-Scel enzymes can also be 
designed and used in accordance with the current invention. 

A nucleotide sequence encoding the amino acid sequence of I-Scel, wherein the amino- 
terminally located 4 amino acids have been replaced by a nuclear localization signal (SEQ 
ID 1) thus consist of 244 trinucleotides which can be represented as Rl through R244. For 
each of these positions between 1 and 6 possible choices of trinucleotides encoding the same 
amino acid are possible. Table 1 sets forth the possible choices for the trinucleotides 
encoding the amino acid sequence of SEQ ID 1 and provides for the structural requirements 
(either conditional or absolute) which allow to avoid inclusion into the synthetic DNA 
sequence the above mentioned "forbidden nucleotide sequences". Also provided is the 
nucleotide sequence of the contiguous trinucleotides in UIPAC code. 

As used herein, the symbols of the UIPAC code have their usual meaning i.e. N= A or C or 
G or T; R= A or G; Y= C or T; B= C or G or T (not A); V= A or C or G (not T); D= A or G 
or T (not C); H=A or C or T (not G); K= G or T; M= A or C; S- G or C; W=A or T. 

Thus in one embodiment of the invention, an isolated synthetic DNA fragment is provided 
which comprises a nucleotide sequence as set forth in SEQ ID No 2, wherein the codons are 
chosen among the choices provided in such a way as to obtain a nucleotide sequence with an 
overall GC content of about 50% to about 60%, preferably about 54%-55% provided that the 
nucleotide sequence from position 28 to position 30 is not AAG; if the nucleotide sequence 
from position 34 to position 36 is AAT then the nucleotide sequence from position 37 to 
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position 39 is not ATT or ATA; if the nucleotide sequence form position 34 to position 36 is 
AAC then the nucleotide sequence from position 37 to position 39 is not ATT 
simultaneously with the nucleotide sequence from position 40 to position 42 being AAA; if 
the nucleotide sequence from position 34 to position 36 is AAC then the nucleotide sequence 
from position 37 to position 39 is not ATA; if the nucleotide sequence from position 37 to 
position 39 is ATT or ATA then the nucleotide sequence from position 40 to 42 is not AAA; 
the nucleotide sequence from position 49 to position 51 is not CAA; the nucleotide sequence 
from position 52 to position 54 is not GTA; the codons from the nucleotide sequence from 
position 58 to position 63 are chosen according to the choices provided in such a way that 
the resulting nucleotide sequence does not comprise ATTTA; if the nucleotide sequence 
from position 67 to position 69 is CCC then the nucleotide sequence from position 70 to 
position 72 is not AAT; if the nucleotide sequence from position 76 to position 78 is AAA 
then the nucleotide sequence from position 79 to position 81 is not TTG simultaneously with 
the nucleotide sequence from position 82 to 84 being CTN; if the nucleotide sequence from 
position 79 to position 81 is TTA or CTA then the nucleotide sequence from position 82 to 
position 84 is not TTA; the nucleotide sequence from position 88 to position 90 is not GAA; 
if the nucleotide sequence from position 91 to position 93 is TAT, then the nucleotide 
sequence from position 94 to position 96 is not AAA; if the nucleotide sequence from 
position from position 97 to position 99 is TCC or TCG or AGC then the nucleotide 
sequence from position 100 to 102 is not CCA simultaneously with the nucleotide sequence 
from position 103 to 105 being TTR; it the nucleotide sequence from position 100 to 102 is 
CAA then the nucleotide sequence from position 103 to 105 is not TTA; if the nucleotide 
sequence from position 109 to position 111 is GAA then the nucleotide sequence from 112 
to 114 is not TTA; if the nucleotide sequence from position 115 to 117 is AAT then the 
nucleotide sequence from position 1 18 to position 120 is not ATT or ATA; if the nucleotide 
sequence from position 121 to 123 is GAG then the nucleotide sequence from position 124 
to position 126; the nucleotide sequence from position 133 to 135 is not GCA; the nucleotide 
sequence from position 139 to position 141 is not ATT; if the nucleotide sequence from 
position 142 to position 144 is GGA then the nucleotide sequence from position 145 to 
position 147 is not TTA; if the nucleotide sequence from position 145 to position 147 is TTA 
then the nucleotide sequence from position 148 to position 150 is not ATA simultaneously 
with the nucleotide sequence from position 151 to 153 being TTR; if the nucleotide sequence 
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from position 145 to position 147 is CTA then the nucleotide sequence from position 148 to 
position 150 is not ATA simultaneously with the nucleotide sequence from position 151 to 
153 being TTR; if the nucleotide sequence from position 148 to position 150 is ATA then the 
nucleotide sequence from position 151 to position 153 is not CTA or TTG; if the nucleotide 
sequence from position 160 to position 162 is GCA then the nucleotide sequence from 
position 163 to position 165 is not TAC; if the nucleotide sequence from position 163 to 
position 165 is TAT then the nucleotide sequence from position 166 to position 168 is not 
ATA simultaneously with the nucleotide sequence from position 169 to position 171 being 
AGR; the codons from the nucleotide sequence from position 172 to position 177 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise GCAGG; the codons from the nucleotide sequence from position 178 to 
position 1 86 are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise AGGTA; if the nucleotide sequence from position 
193 to position 195 is TAT, then the nucleotide sequence from position 196 to position 198 
is not TGC; the nucleotide sequence from position 202 to position 204 is not CAA; the 
nucleotide sequence from position 217 to position 219 is not AAT; if the nucleotide 
sequence from position 220 to position 222 is AAA then the nucleotide sequence from 
position 223 to position 225 is not GCA; if the nucleotide sequence from position 223 to 
position 225 is GCA then the nucleotide sequence from position 226 to position 228 is not 
TAC; if the nucleotide sequence from position 253 to position 255 is GAC, then the 
nucleotide sequence from position 256 , to position 258 is not CAA; if the nucleotide 
sequence from position 277 to position 279 is CAT, then the nucleotide sequence from 
position 280 to position 282 is not AAA; the codons from the nucleotide sequence from 
position 298 to position 303 are chosen according to the choices provided in such a way that 
the resulting nucleotide sequence does not comprise ATTTA; if the nucleotide sequence 
from position 304 to position 306 is GGC then the nucleotide sequence from position 307 to 
position 309 is not AAT; the codons from the nucleotide sequence from position 307 to 
position 312 are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; the codons from the nucleotide sequence 
from position 334 to position 342 are chosen according to the choices provided in such a way 
that the resulting nucleotide sequence does not comprise ATTTA; if the nucleotide sequence 
from position 340 to position 342 is AAG then the nucleotide sequence from position 343 to 
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345 is not CAT; if the nucleotide position from position 346 to position 348 is CAA then the 
nucleotide sequence from position 349 to position 351 is not GCA; the codons from the 
nucleotide sequence from position 349 to position 357 are chosen according to the choices 
provided in such a way mat the resulting nucleotide sequence does not comprise ATTTA; 
the nucleotide sequence from position 355 to position 357 is not AAT; if the nucleotide 
sequence from position 358 to position 360 is AAA then the nucleotide sequence from 
position 361 to 363 is not TTG; if the nucleotide sequence from position 364 to position 366 
is GCC then the nucleotide sequence from position 367 to position 369 is not AAT; the 
codons from the nucleotide sequence from position 367 to position 378 are chosen according 
to the choices provided in such a way that the resulting nucleotide sequence does not 
comprise ATTTA; if the nucleotide sequence from position 382 to position 384 is AAT then 
the nucleotide sequence from position 385 to position 387 is not AAT; the nucleotide 
sequence from position 385 to position 387 is not AAT; if the nucleotide sequence from 
position 400 to 402 is CCC, then the nucleotide sequence from position 403 to 405 is not 
AAT; if the nucleotide sequence from position 403 to 405 is AAT, then the nucleotide 
sequence from position 406 to 408 is not AAT; the codons from the nucleotide sequence 
from position 406 to position 41 1 are chosen according to the choices provided in such a 
way that the resulting nucleotide sequence does not comprise ATTTA; the codons from the 
nucleotide sequence from position 421 to position 426 are chosen according to the choices 
provided in such a way that the resulting nucleotide sequence does not comprise ATTTA; 
the nucleotide sequence from position 430 to position 432 is not CCA; if the nucleotide 
sequence from position 436 to position 438 is TCA then the nucleotide sequence from 
position 439 to position 441 is not TTG; the nucleotide sequence from position 445 to 
position 447 is not TAT; the nucleotide sequence from position 481 to 483 is not AAT; 
if the nucleotide sequence from position 484 to position 486 is AAA, then the nucleotide 
sequence from position 487 to position 489 is not AAT simultaneously with the nucleotide 
sequence from position 490 to position 492 being AGY; if the nucleotide sequence from 
position 490 to position 492 is TCA, then the nucleotide sequence from position 493 to 
position 495 is not ACC simultaneously with the nucleotide sequence from position 496 to 
498 being AAY; if the nucleotide sequence from position 493 to position 495 is ACC, then 
the nucleotide sequence from position 496 to 498 is not AAT; the nucleotide sequence from 
position 496 to position 498 is not AAT; if the nucleotide sequence from position 499 to 



WO 2005/049842 



17 



PCT/EP2004/013122 



position 501 is AAA then the nucleotide sequence from position 502 to position 504 is not 
TCA or AGC; if the nucleotide sequence from position 508 to position 510 is GTA, then the 
nucleotide sequence from position 511 to 513 is not TTA; if the nucleotide sequence from 
position 514 to position 516 is AAT then the nucleotide sequence from position 517 to 
position 519 is not ACA; if the nucleotide sequence from position 517 to position 519 is 
ACC or ACG, then the nucleotide sequence from position 520 to position 522 is not CAA 
simultaneously with the nucleotide sequence from position 523 to position 525 being TCN; 
the codons from the nucleotide sequence from position 523 to position 531 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise ATTTA; if the nucleotide sequence from position 544 to position 546 is GAA 
then the nucleotide sequence from position 547 to position 549 is not TAT, simultaneously 
with the nucleotide sequence from position 550 to position 552 being TTR; the codons from 
the nucleotide sequence from position 547 to position 552 are chosen according to the 
choices provided in such a way that the resulting nucleotide sequence does not comprise 
ATTTA; if the nucleotide sequence from position 559 to positon 561 is GGA then the 
nucleotide sequence from position 562 to position 564 is not TTG simultaneously with the 
nucleotide sequence from position 565 to 567 being CGN; if the nucleotide sequence from 
position 565 to position 567 is CGC then the nucleotide sequence from position 568 to 
position 570 is not AAT; the nucleotide sequence from position 568 to position 570 is not 
AAT; if the nucleotide sequence from position 574 to position 576 is TTC then the 
nucleotide sequence from position 577 to position 579 is not CAA simultaneously with the 
nucleotide sequence from position 580 to position 582 being TTR; if the nucleotide sequence 
from position 577 to position 579 is CAA then the nucleotide sequence from position 580 to 
position 582 is not TTA; if the nucleotide sequence from position 583 to position 585 is 
AAT the nucleotide sequence from position 586 to 588 is not TGC; the nucleotide sequence 
from position 595 to position 597 is not AAA; ' if the nucleotide sequence from position 598 
to position 600 is ATT then the nucleotide sequence from position 601 to position 603 is not 
AAT; the nucleotide sequence from position 598 to position 600 is not ATA; the nucleotide 
sequence from position 601 to position 603 is not AAT; if the nucleotide sequence from 
position 604 to position 606 is AAA then the nucleotide sequence from position 607 to 
position 609 is not AAT; the nucleotide sequence from position 607 to position 609 is not 
AAT; the nucleotide sequence from position 613 to position 615 is not CCA; if the 
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nucleotide sequence from position 613 to position 615 is CCG, then the nucleotide sequence 
from position 616 to position 618 is not ATA; if the nucleotide sequence from position 616 
to the nucleotide at position 618 is ATA, then the nucleotide sequence from position 619 to 
621 is not ATA; if the nucleotide sequence from position 619 to position 621 is ATA, then 
the nucleotide sequence from position 622 to position 624 is not TAC; the nucleotide 
sequence from position 619 to position 621 is not ATT; the codons from the nucleotide 
sequence from position 640 to position 645 are chosen according to the choices provided in 
such a way that the resulting nucleotide sequence does not comprise ATTTA; if the 
nucleotide sequence from position 643 to position 645 is TTA then the nucleotide sequence 
from position 646 to position 648 is not ATA; if the nucleotide sequence from position 643 
to position 645 is CTA then the nucleotide sequence from position 646 to position 648 is not 
ATA; the codons from the nucleotide sequence from position 655 to position 660 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise ATTTA; if the nucleotide sequence from position 658 to 660 is TTA or CTA 
then the nucleotide sequence from position 661 to position 663 is not ATT or ATC; the 
nucleotide sequence from position 661 to position 663 is not ATA; if the nucleotide 
sequence from position 661 to position 663 is ATT then the nucleotide sequence from 
position 664 to position 666 is not AAA; the codons from the nucleotide sequence from 
position 670 to position 675 are chosen according to the choices provided in such a way that 
the resulting nucleotide sequence does not comprise ATTTA; if the nucleotide sequence 
from position 691 to position 693 is TAT then the nucleotide sequence from position 694 to 
position 696 is not AAA; if the nucleotide sequence from position 694 to position 696 is 
AAA then the nucleotide sequence from position 697 to position 699 is not TTG; if the 
nucleotide sequence from position 700 to position 702 is CCC then the nucleotide sequence 
from position 703 to position 705 is not AAT; if the nucleotide sequence from position 703 
to position 705 is AAT then the nucleotide sequence from position 706 to position 708 is not 
ACA or ACT; if the nucleotide sequence from position 706 to position 708 is ACA then the 
nucleotide sequence from position 709 to 71 1 is not ATA simultaneously with the nucleotide 
sequence from position 712 to position 714 being AGY; the nucleotide sequence does not 
comprise the codons TTA, CTA, ATA, GTA, TCG, CCG, ACG and GCG; said nucleotide 
sequence does not comprise a GC stretch consisting of 7 consecutive nucleotides selected 
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from the group of G or C; and the nucleotide sequence does not comprise a AT stretch 
consisting of 5 consecutive nucleotides selected from the group of A or T. 

A preferred group of synthetic nucleotide sequences is set forth in Table 2 and corresponds 
to an isolated synthetic DNA fragment is provided which comprises a nucleotide sequence as 
set forth in SEQ ID No 3, wherein the codons are chosen among the choices provided in such 
a way as to obtain a nucleotide sequence with an overall GC content of about 50% to about 
60%, preferably about 54%-55% provided that if the nucleotide sequence from position 121 
to position 123 is GAG then the nucleotide sequence from position 124 to 126 is not CAA; if 
the nucleotide sequence from position 253 to position 255 is GAC then the nucleotide 
sequence from position 256 to 258 is not CAA; if the nucleotide sequence from position 277 
to position 279 is CAT then the nucleotide sequence from position 280 to 282 is not AAA; if 
the nucleotide sequence from position 340 to position 342 is AAG then the nucleotide 
sequence from position 343 to position 345 is not CAT; if the nucleotide sequence from 
position 490 to position 492 is TCA then the nucleotide sequence from position 493 to 
position 495 is not ACC; if the nucleotide sequence from position 499 to position 501 is 
AAA then the nucleotide sequence from position 502 to 504 is not TCA or AGC; if the 
nucleotide sequence from position 517 to position 519 is ACC then the nucleotide sequence 
from position 520 to position 522 is not CAA simultaneous with the nucleotide sequence 
from position 523 to 525 being TCN; if the nucleotide sequence from position 661 to 
position 663 is ATT then the nucleotide sequence from position 664 to position 666 is not 
AAA; the codons from the nucleotide sequence from position 7 to position 15 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise a stretch of seven contiguous nucleotides from the group of G or C; the codons 
from the nucleotide sequence from position 61 to position 69 are chosen according to the 
choices provided in such a way that the resulting nucleotide sequence does not comprise a 
stretch of seven contiguous nucleotides from the group of G or C; the codons from the 
nucleotide sequence from position 130 to position 138 are chosen according to the choices 
provided in such a way that the resulting nucleotide sequence does not comprise a stretch of 
seven contiguous nucleotides from the group of G or C; the codons from the nucleotide 
sequence from position 268 to position 279 are chosen according to the choices provided in 
such a way that the resulting nucleotide sequence does not comprise a stretch of seven 
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contiguous nucleotides from the group of G or C; the codons from the nucleotide sequence 
from position 322 to position 333 are chosen according to the choices provided in such a way 
that the resulting nucleotide sequence does not comprise a stretch of seven contiguous 
nucleotides from the group of G or C; the codons from the nucleotide sequence from position 
460 to position 468 are chosen according to the choices provided in such a way that the 
resulting nucleotide sequence does not comprise a stretch of seven contiguous nucleotides 
from the group of G or C; the codons from the nucleotide sequence from position 13 to 
position 27 are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise a stretch of five contiguous nucleotides from the 
group of A or T; the codons from the nucleotide sequence from position 37 to position 48 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group of A or 
T; the codons from the nucleotide sequence from position 184 to position 192 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise a stretch of five contiguous nucleotides from the group of A or T; the codons 
from the nucleotide sequence from position 214 to position 219 are chosen according to the 
choices provided in such a way that the resulting nucleotide sequence does not comprise a 
stretch of five contiguous nucleotides from the group of A or T; the codons from the 
nucleotide sequence from position 277 to position 285 are chosen according to the choices 
provided in such a way that the resulting nucleotide sequence does not comprise a stretch of 
five contiguous nucleotides from the group of A or T; and the codons from the nucleotide 
sequence from position 388 to position 396 are chosen according to the choices provided in 
such a way that the resulting nucleotide sequence does not comprise a stretch of five 
contiguous nucleotides from the group of A or T; the codons from the nucleotide sequence 
from position 466 to position 474 are chosen according to the choices provided in such a way 
that the resulting nucleotide sequence does not comprise a stretch of five contiguous 
nucleotides from the group of A or T; the codons from the nucleotide sequence from position 
484 to position 489 are chosen according to the choices provided in such a way that the 
resulting nucleotide sequence does not comprise a stretch of five contiguous nucleotides 
from the group of A or T; the codons from the nucleotide sequence from position 571 to 
position 576 are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise a stretch of five contiguous nucleotides from the 
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group of A or T; the codons from the nucleotide sequence from position 598 to position 603 
are chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group of A or 
T; the codons from the nucleotide sequence from position 604 to position 609 are chosen 
according to the choices provided in such a way that the resulting nucleotide sequence does 
not comprise a stretch of five contiguous nucleotides from the group of A or T; the codons 
from the nucleotide sequence from position 613 to position 621 are chosen according to the 
choices provided in such a way that the resulting nucleotide sequence does not comprise a 
stretch of five contiguous nucleotides from the group of A or T; the codons from the 
nucleotide sequence from position 646 to position 651 are chosen according to the choices 
provided in such a way that the resulting nucleotide sequence does not comprise a stretch of 
five contiguous nucleotides from the group of A or T; the codons from the nucleotide 
sequence from position 661 to position 666 are chosen according to the choices provided in 
such a way that the resulting nucleotide sequence does not comprise a stretch of five 
contiguous nucleotides from the group of A or T; and the codons from the nucleotide 
sequence from position 706 to position 714 are chosen according to the choices provided in 
such a way that the resulting nucleotide sequence does not comprise a stretch of five 
contiguous nucleotides from the group of A or T. 

The nucleotide sequence of SEQ ID No 4 is an example of such a synthetic nucleotide 
sequence encoding an I-Scel endonuclease which does no longer contain any of the 
nucleotide sequences or codons to be avoided. However, it will be clear that a person skilled 
in the art can readily obtain a similar sequence encoding I-Scel by replacing one or more 
(between two to twenty) of the nucleotides to be chosen for any of the alternatives provided 
in the nucleotide sequence of SEQ ID 3 (excluding any of the forbidden combinations 
described in the preceding paragraph) and use it to obtain a similar effect. 

For expression in plant cell, the synthetic DNA fragments encoding I-Scel may be operably 
linked to a plant expressible promoter in order to obtain a plant expressible chimeric gene. 

A person skilled in the art will immediately recognize that for this aspect of the invention, it 
is not required that the repair DNA and/or the DSBI endonuclease encoding DNA are 
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introduced into the plant cell by direct DNA transfer methods, but that the DNA may thus 
also be introduced into plant cells by Agrobacterium-mediated transformation methods as are 
available in the art 

In yet another aspect, the invention relates to a method for introducing a foreign DNA of 
interest into a preselected site of a genome of a plant cell comprising the steps of 

(a) inducing a double stranded break at the preselected site in the genome of the cell ; 

(b) introducing the foreign DNA of interest into the plant cell ; 

characterized in that prior to step (a), the plant cells are incubated in a plant phenolic 
compound. 

-Plant phenolic compounds" or "plant phenolics" suitable for the invention are those 
substituted phenolic molecules which are capable to induce a positive chemotactic response, 
particularly those who are capable to induce increased vir gene expression in a Ti-plasmid 
containing Agrobacterium sp., particularly a Ti-plasmid containing Agrobacterium 
tumefaciens. Methods to measure chemotactic response towards plant phenolic compounds 
have been described by Ashby et al (1988 J. Bacterial. 170: 4181-4187) and methods to 
measure induction of vir gene expression are also well known (Stachel et al., 1985 Waft/re 
318: 624-629 ; Bolton et al. 1986 Science 232: 983-985). Preferred plant phenolic compounds 
are those found in wound exudates of plant cells. One of the best known plant phenolic 
compounds is acetosyringone, which is present in a number of wounded and intact cells of 
various plants, albeit it in different concentrations. However, acetosyringone (3,5- 
dimethoxy-4-hydroxyacetophenone) is not the only plant phenolic which can induce the 
expression of vir genes. Other examples are a-hydroxy-acetosyringone, sinapinic acid (3,5 
dimethoxy-4-hydroxycinnamic acid), syringic acid (4-hydroxy-3,5 dimethoxybenzoic acid), 
ferulic acid (4-hydroxy-3-methoxycinnamic acid), catechol (1,2-dihydroxybenzene), p- 
hydroxybenzoic acid (4-hydroxybenzoic acid), p-resorcylic acid (2,4 dihydroxybenzoic 
acid), protocatechuic acid (3,4-dihydroxybenzoic acid), pyrrogallic acid (2,3,4 - 
trihydroxybenzoic acid), gallic acid (3,4,5-trihydroxybenzoic acid) and vanillin (3-methoxy- 
4-hydroxybenzaldehyde). As used herein, the mentioned molecules are referred to as plant 
phenolic compounds. Plant phenolic compounds can be added to the plant culture medium 
either alone or in combination with other plant phenolic compounds. Although not intending 
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to limit the invention to a particular mode of action, it is thought that the apparent 
stimulating effect of these plant phenolics on cell division (and thus also genome replication) 
may be enhancing targeted insertion of foreign DNA. 

Plant cells are preferably incubated in plant phenolic compound for about one week, 
although it is expected incubation for about one or two days in or on a plant phenolic 
compound will be sufficient. Plant cells should be incubated for a time sufficient to stimulate 
cell division. According to Guivarc'h et al; (1993, Protoplasma 174: 10-18) such effect may 
already be obtained by incubation of plant cells for as little as 1 0 minutes. 

The above mentioned improved methods for homologous recombination based targeted 
DNA insertion may also be applied to improve the quality of the transgenic plant cells and 
plants obtained by direct DNA transfer methods, particularly by microprojectile 
bombardment. It is well known in the art that introduction of DNA by microprojectile 
bombardment frequently leads to complex integration patterns of the introduced DNA 
(integration of multiple copies of the foreign DNA of interest, either complete or partial, 
generation of repeat structures). Nevertheless, some plant genotypes or varieties may be 
more amenable to transformation using microprojectile bombardment than to transformation 
using e.g. Agrobacterium tumefaciens. It would thus be advantageous if the quality of the 
transgenic plant cells or plants obtained through microprojectile bombardment could be 
improved, i.e. if the pattern of integration of the foreign DNA could be influenced to be 
simpler. 

The above mentioned finding that introduction of foreign DNA through microprojectile 
bombardment in the presence of an induced double stranded DNA break in the nuclear 
genome, whereby the foreign DNA has homology to the sequences flanking the double 
stranded DNA break frequently (about 50% of the obtained events) leads to simple 
integration patterns (single copy insertion in a predictable way and no insertion of additional 
fragments of the foreign DNA) provides the basis for a method of simplifying the complexity 
of insertion of foreign DNA in the nuclear genome of plant cells. 
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Thus the invention also relates to a method of producing a transgenic plant by 
microprojectile bombardment comprising the steps of 

(a) inducing a double stranded DNA break at a preselected site in the genome of a cell a 
plant, in accordance with the methods described elsewhere in this document or 
available in the art; and 

(b) introducing the foreign DNA of interest into the plant cell by microprojectile 
bombardment wherein said foreign DNA of interest is flanked by two DNA regions 
having at least 80% sequence identity to the DNA regions flanking the preselected 
site in the genome of the plant. 

A significant portion of the transgenic plant population thus obtained will have a simple 
integration pattern of the foreign DNA in the genome of the plant cells, more particularly a 
significant portion of the transgenic plants will only have a one copy insertion of the foreign 
DNA, exactly between the two DNA regions flanking the preselected site in the genome of 
the plant. This portion is higher than the population of transgenic plants with simple 
integration patterns, when the plants are obtained by simple microprojectile bombardment 
without inducing a double stranded DNA break, and without providing the foreign DNA 
with homology to the genomic regions flanking the preselected site. 

In a convenient embodiment of the invention, the target plant cell comprises in its genome a 
marker gene, flanked by two recognition sites for a rare-cleaving double stranded DNA 
break inducing endonuclease, one on each side. This marker DNA may be introduced in the 
genome of the plant cell of interest using any method of transformation, or may have been 
introduced into the genome of a plant cell of another plant line or variety (such a as a plant 
line or variety easy amenable to transformation) and introduced into the plant cell of interest 
by classical breeding techniques. Preferably, the population of transgenic plants or plant cells 
comprising a marker gene flanked by two recognition sites .for a rare-cleaving double 
stranded break inducing endonuclease has been analysed for the expression pattern of the 
marker gene (such as high expression, temporally or spatially regulated expression) and the 
plant lines with the desired expression pattern identified. Production of a transgenic plant by 
microprojectile bombardment comprising the steps of 
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(a) inducing a double stranded DNA break at a preselected site in the genome of a cell of 
a plant, in accordance with the methods described elsewhere in this document or 
available in the art; and 

(b) introducing the foreign DNA of interest into the plant cell by microprojectile 
bombardment wherein said foreign DNA of interest is flanked by two DNA regions 
having at least 80% sequence identity to the DNA regions flanking the preselected 
site in the genome of the plant; 

will lead to transgenic plant cells and plants wherein the marker gene has been replaced 
by the foreign DNA of interest. 

The marker gene may be any selectable or a screenable plant-expressible marker gene, which 
is preferably a conventional chimeric marker gene. The chimeric marker gene can comprise a 
marker DNA that is under the control of, and operatively linked at its 5' end to, a promoter, 
preferably a constitutive plant-expressible promoter, such as a CaMV 35S promoter, or a 
light inducible promoter such as the promoter of the gene encoding the small subunit of 
Rubisco; and operatively linked at its 3' end to suitable plant transcription termination and 
polyadenylation signals. The marker DNA preferably encodes an RNA, protein or 
polypeptide which, when expressed in the cells of a plant, allows such cells to be readily 
separated from those cells in which the marker DNA is not expressed. The choice of the 
marker DNA is not critical, and any suitable marker DNA can be selected in a well known 
manner. For example, a marker DNA can encode a protein that provides a distinguishable 
color to the transformed plant cell, such as the Al gene (Meyer et al (1987), Nature 330: 
677), can encode a fluorescent protein [Chalfie et al Science 263: 802-805 (1994); Crameri 
et al Nature Biotechnology 14: 315-319 (1996)], can encode a protein that provides 
herbicide resistance to the transformed plant cell, such as the bar gene, encoding PAT which 
provides resistance to phosphinothricin (EP 0242246), or can encode a protein that provides 
antibiotic resistance to the transformed cells, such as the aac(6') gene, encoding GAT which 
provides resistance to gentamycin (WO 94/01560). Such selectable marker gene generally 
encodes a protein that confers to the cell resistance to an antibiotic or other chemical 
compound that is normally toxic for the cells. In plants the selectable marker gene may thus 
also encode a protein that confers resistance to a herbicide, such as a herbicide comprising a 
glutamine synthetase inhibitor (e.g. phosphinothricin) as an active ingredient. An example of 
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such genes are genes encoding phosphinothricin acetyl transferase such as the sfr or sfrv 
genes (EP 242236; EP 242246; DeBlockefa/., 19S7EMBOJ. 6: 2513-2518). 

The introduced repair DNA may further comprise a marker gene that allows to better 
discriminate between integration by homologous recombination at the preselected site and 
the integration elsewhere in the genome. Such marker genes are available in the art and 
include marker genes whereby the absence of the marker gene can be positively selected for 
under selective conditions (e.g. codA, cytosyine deaminase from E. coli conferring 
sensitivity to 5-fluoro cytosine, Perera et al 1993 Plant MoL Biol. 23, 793; Stougaard (1993) 
Plant J. : 755). The repair DNA needs to comprise the marker gene in such a way that 
integration of the repair DNA into the nuclear genome in a random way results in the 
presence of the marker gene whereas the integration of the repair DNA by homologous 
recombination results in the absence of the marker gene. 

It will be immediately clear that the same results can also be obtained using only one 
preselected site at which to induce the double stranded break, which is located in or near a 
marker gene. The flanking regions of homology are then preferably chosen in such way as to 
either inactivate the marker gene, or delete the marker gene and substitute for the foreign 
DNA to be inserted. 

It will be appreciated that the means and methods of the invention are particularly useful for 
com, but may also be used in other plants with similar effects, particularly in cereal plants 
including wheat, oat, barley, rye, rice, turfgrass, sorghum, millet or sugarcane plants. The 
methods of the invention can also be applied to any plant including but not limited to cotton, 
tobacco, canola, oilseed rape, soybean, vegetables, potatoes, Lemna spp., Nicotiana spp., 
Arabidopsis, alfalfa, barley, bean, com, cotton, flax, pea, rape, rice, rye, safflower, sorghum, 
soybean, sunflower, tobacco, wheat, asparagus, beet, broccoli, cabbage, carrot, cauliflower, 
celery, cucumber, eggplant, lettuce, onion, oilseed rape, pepper, potato, pumpkin, radish, 
spinach, squash, tomato, zucchini, almond, apple, apricot, banana, blackberry, blueberry, 
cacao, cherry, coconut, cranberry, date, grape, grapefruit, guava, kiwi, lemon, lime, mango, 
melon, nectarine, orange, papaya, passion fruit, peach, peanut, pear, pineapple, pistachio, 
plum, raspberry, strawberry, tangerine, walnut and watermelon. 
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It is also an object of the invention to provide plant cells and plants comprising foreign DNA 
molecules inserted at preselected sites, according to the methods of the invention. Gametes, 
seeds, embryos, either zygotic or somatic, progeny or hybrids of plants comprising the 
targeted DNA insertion events, which are produced by traditional breeding methods are also 
included within the scope of the present invention. 

The plants obtained by the methods described herein may be further crossed by traditional 
breeding techniques with other plants to obtain progeny plants comprising the targeted DNA 
insertion events obtained according to the present invention. 

The following non-limiting Examples describe the design of a modified I-Scel encoding 
chimeric gene, and the use thereof to insert foreign DNA into a preselected site of the plant 
genome. 

Unless stated otherwise in the Examples, all recombinant DNA techniques are carried out 
according to standard protocols as described in Sambrook et ah (1989) Molecular Cloning: A 
Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, NY and in 
Volumes 1 and 2 of Ausubel et ah (1994) Current Protocols in Molecular Biology, Current 
Protocols, USA. Standard materials and methods for plant molecular work are described in 
Plant Molecular Biology Labfax (1993) by R.D.D. Croy, jointly published by BIOS 
Scientific Publications Ltd (UK) and Blackwell Scientific Publications, UK. Other references 
for standard molecular biology techniques include Sambrook and Russell (2001) Molecular 
Cloning: A Laboratory Manual, Third Edition, Cold Spring Harbor Laboratory Press, NY, 
Volumes I and II of Brown (1998) Molecular Biology LabFax, Second Edition, Academic 
Press (UK). Standard materials and methods for polymerase chain reactions can be found in 
Dieffenbach and Dveksler (1995) PCR Primer: A Laboratory Manual, Cold Spring Harbor 
Laboratory Press, and in McPherson at al. (2000) PCR - Basics: From Background to Bench, 
First Edition, Springer Verlag, Germany. 

Throughout the description and Examples, reference is made to the following sequences: 

SEQ ID No 1 : amino acid sequence of a chimeric I-Scel comprising a nuclear localization 
signal linked to a I-Scel protein lacking the 4 amino-terminal amino acids. 
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SEQ ID No 2: nucleotide sequence of I-Scel coding region (UIPAC code). 
SEQ ID No 3: nucleotide sequence of synthetic I-Scel coding region (UIPAC code). 
SEQ ID No 4: nucleotide sequence of synthetic I-Scel coding region. 
SEQ ID No 5: nucleotide sequence of the T-DNA of pTTAM78 (target locus). 
SEQ ID No 6: nucleotide sequence of the T-DNA of pTTA82(repair DNA). 
SEQ ID No 7: nucleotide sequence of pCV78. 
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PROVISIO 




















NOTAAG I 




IF R12 AAT NOT (R13 ATT OR R13 ATA). 
IF R12 AAC NOT (R1 3 ATT AND R14 AAA) 
IF R12 AAC NOT R13 ATA 


IFR13 ATT NOT R14 AAA 
IFR13 ATA NOT R14 AAA 








NOTCAA 


NOTGTA I 




AVOID ATTTA 






IF R23 CCC NOT R24 MT 






IF R26 AM NOT (R27 TTG AND R28 CTN) 


IF R27 (TTA OR CTA) NOT R28 TTA 






NOTGM 


IF R31 TAT NOT R32 MA 


UIPAC code 


ATG 


GCN 


AAR 


CCN 


CCN 


AAR 


AAR 


AAR 


AGRorCGN 


AAR I 


GTN 


AAY 


HIV 


AAR 


AAR 


AAY 


CAR 


GTN 


ATG 


AAY 


z 
(— 

o 

i_ 

O 
01 

1= 


GGN 


CCN 


AW 


AGYorTCN 


MR 


TTR or CTN 


TTR or CTN 


MR 


GAR 


TAY 


Possible trinucleotides 


ATG 


GCAGCC GCG GCT 


AAAAAG 


CCA CCC CCG CCT 


CCA CCC CCG CCT 


AAAAAG 


AAAAAG 


AAAAAG 


AGA AGG CGA CGC CGG CGT 


AAAAAG I 


GTAGTC GTG GTT 


AAC AAT 


ATAATCATT 


AAAAAG 


AAAAAG 


AAC AAT 


CAACAG 


GTAGTCGTGGTT 


ATG 


MCAAT 


TTA TTG CTA CTC CTG CTT 


GGAGGC GGG GGT 


CCA CCC CCG CCT 


MCMT 


AGC AGT TCA TCC TCG TCT 


MAMG 


TTA TTG CTA CTC CTG CTT 


110 910 010 VIO 911 Vll 


MAMG 


GMGAG 


TACTAT 


AA 




< 




Q_ 


CL- 








01 


^ 


> 


z 








z 


a 


> 




z 




O 


Q. 


z 


CO 


id 


—1 


_i 




LU 


>- 


Trinucleotide 


5 


CNJ 

01 


CO 
01 


01 


IO 


CD 
01 


01 


CO 


o> 
01 


1 R10 il 


T— 

DC 


R12 


R13 


R14 


R15 


R16 


R17 


R18 


R19 


R20 


R21 


R22 


R23 


R24 


R25 


R26 


R27 


R28 


R29 


R30 


R31 
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PROVISIO 




IF R33 (TCC OR TCG OR AGC) NOT (R34 CAA 
AND R35 TTR) 


IF R34 CAA NOT R35 TTA 






IF R37 GAA NOT R38 TTA 




IF R39 AAT NOT R40 (ATT OR ATA) 




IF R41 GAG NOT R42 CAA 








NOTGCA 




NOT ATT 


IF R48 GGA NOT R49 TTA 


IF R49 TTA NOT (R50 ATA AND R51 TTR) 
IF R49 CTA NOT (R50 ATA AND R51 TTR) 


I IF R50 ATA NOT R51 (CTA OR TTG) 








IF R54 GCA NOT R55 TAC 


IF R55 TAT NOT (R56 ATA AND R57 AGR) 






AVOID GCAGG 






AVOID MGGT 








IF R65 TAT NOT R66 TGC I 


UIPAC code 


MR J 


AGYorTCN 


CAR | 


TTR orCTN 


ATH 


GAR 


TTRorCTN 


AAY 


ATH 


GAR 


CAR 


TTY 


GAR 


GCN 


GGN 


ATH 


GGN 


TTR orCTN 


ATH 


TTR orCTN 


GGN 


GAY 


GCN 


TAY 


ATH 


AGR orCGN 


AGYorTCN 


I AGRorCGN 


GAY 


GAR 


GGN 


MR 


NOV 


TAY 


Possible trinucleotides 


MAMG 


AGC AGT TCA TCC TCG TCT 


CAA CAG 


TTA TTG CTA CTC CTG CTT 


ATAATCATT 


GAA GAG 


TTA TTG CTA CTC CTG CTT 


AACAAT 


ATAATCATT 


GAA GAG 


CAACAG 


TTCTTT 


GAA GAG 


GCAGCC GCG GCT 


GGAGGC GGG GGT 


ATAATCATT 


GGAGGCGGGGGT 


TTA TTG CTA CTC CTG CTT 


ATAATCATT 


TTA TTG CTA CTC CTG CTT 


GGAGGCGGGGGT 


GAC GAT 


GCAGCCGCGGCT 


TAC TAT 


ATAATCATT 


AGA AGG CGA CGC CGG CGT 


AGC AGT TCA TCC TCG TCT 


AGA AGG CGA CGC CGG CGT 


GAC GAT 


GMGAG 


GGA GGC GGG GGT 


MAMG 


ACAACCACGACT 


TAC TAT 


AA 1 




CO 


a 


_j 





ID 


-j 


z 




UJ 


O 


LL. 


LU 


< 


CD 




O 


_i 




_j 


O 


a 


< 


>- 




cc 


CO 


cc 


a 


lli 


0 






>- 


1 Trinucleotide 1 


CS 

cc 

cc 


R33 


cc 
cc 


IT 

) CC 

cc 


» cc 

► CC 

. cc 


> cc 

cc 


CO 

cc 
CC 


CF3 
CC 

cc 


c 
cc 


cc 


CM 

s 


£2 
DC 


CC 


s 


cc 
a: 


h- 
cc 


00 

q; 


R49 


R50 


in 
01 


CN 

in 


CO 
LO 

CC 


^* 

in 
CC 


in 
in 
CC 


to 
to 
CC 


trj 
CC 


R58 


R59 


o 

CO 

cc 


CO 

cc 


CS 
CO 

cc 


R63 


R64 


R65 
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PROVISIO 






NOT CAA 1 










NOTAAT I 


IF R74 AAA NOT R75 GCA 


IF R75 GCA NOT R76 TAC 




















IF R85 GAC NOT R86 CAA 
















TlF R93 CAT NOT R94 AAA 














AVOID ATTTA 




UIPAC code 


TGY 


ATG 


CAR ! 


TTY 


GAR 


TGG | 


AAR 


AAY 


AAR 


GCN 


TAY 


ATG 


GAY 


CAY I 


GTN 


TGY 


TTRorCTN 


TTR orCTN 


TAY 


GAY 


CAR 


TGG 


GTN 


TTRorCTN 


AGYorTCN 


CCN 


CCN 


| CAY 


AAR 


AAR 


GAR 


AGR orCGN 


GTN 


AAY 


CAY 


TTRorCTN 


Possible trinucleotides 


TGC TGT 


ATG 1 


CAA CAG ! 


J 111 Oil 


GAA GAG 


TGG 


AAAAAG I 


AACAAT 


AAAAAG l 


109 909 009 V09 


TAC TAT 


ATG 


GAC GAT 


CAC CAT 


GTAGTCGTGGTT 


TGC TGT 


TTA TTG CTA CTC CTG CTT 


TTA TTG CTA CTC CTG CTT 


TAC TAT 


GAC GAT 


CAA CAG 


TGG 


GTAGTCGTGGTT 


TTA TTG CTA CTC CTG CTT 


I AGC AGT TCA TCC TCG TCT 


100 900 000 VOO | 


CCA CCC CCG CCT 


CAC CAT 


AAAAAG 


AAAAAG 


GAAGAG 


AGA AGG CGA CGC CGG CGT 


GTAGTCGTGGTT 


AACAAT 


CAC CAT 


TTA TTG CTA CTC CTG CTT 


AA 1 


o 




a 


u. 


HI 






z 




< 


> 




Q 


X 


> 


O 




_i 


>- 


Q 


O 




> 


_i 


w 


CL 


Q- 


X 






LU 


or 


> 


z 


X 


_i 


Trinucleotide 


R66 


R67 


GO 
CO 

CC 


R69 


R70 


R71 


R72 


R73 


R74 


R75 


R76 


R77 


R78 


R79 


R80 


R81 


R82 


R83 


R84 


R85 


R86 


R87 


R88 


R89 


R90 


R91 


R92 


R93 


R94 


R95 


R96 


R97 


R98 


CD 
CD 

cm 


R100 


R101 
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PROVISIO 


IF R102 GGC NOT R103 AAT 


AVOID ATTTA j 


















AVOID ATTTA 




IF R1 14 AAG NOT R1 15 CAT 




IF R1 16 CM NOT R1 17 GCA 


AVOID ATTTA 




NOTMT I 


IF R120 AM NOT R121 TTG 




IF R122 GCC NOT R123 MT 


AVOID ATTTA I 










IF R128 MT NOT R129 MT 


NOTMT 










IF R134 CCC NOT R135 MT 


IF R135 MT NOT R136 MT 


AVOID ATTTA 




UIPAC code 


GGN 


AAY | 


TTR orCTN 


GTN 


ATH 


NOV 


TGG 


GGN 


GCN 


CAR 


NOV 


All 


AAR 


CAY 


CAR 


GCN 


TTY 


MY 


MR 


TTR orCTN 


GCN 


AW 


TTRorCTN 


All 


ATH 


GTN 


AW 


MY 


MR 


MR 


NOV 


ATH 


CCN 


AW 


MY 


TTRorCTN 


Possible trinucleotides 


GGAGGC GGG GGT 


AACAAT 


TTA TTG CTA CTC CTG CTT 


CD 
CD 


ATAATCATT 


ACAACCACGACT 


TGG 


GGAGGC GGG GGT 


GCAGCC GCG GCT 


CAACAG 


ACAACCACGACT ! 


TTCTTT 


AAAMG 


CAC CAT 


CMCAG 


109 909 009V09 I 


TTCTTT 


MCMT 


AAAMG 


TTA TTG CTA CTC CTG CTT 


GCA GCC GCG GCT 


MCMT 


TTA TTG CTA CTC CTG CTT 


TTCTTT 


ATAATCATT 


GTAGTCGTGGTT 


MCMT 


MCMT 


MAMG 


MAMG 


ACAACCACGACT 


ATAATCATT 


CCA CCC CCG CCT 


MCMT 


MCMT 


110 010 010 V10 Oil Vll 


AA 1 


O 


z 


_i 


> 




l- 




O 


< 


O 


\- 


Li. 




X 


O 


< 


u. 


z 




_i 


< 


z 


_i 


u. 




> 


z 


z 






H 




Q. 


z 


z 


_j 


1 Trinucleotide ! 


R102 


R103 


R104 


R105 


R106 


R107 


R108 


R109 


R110 


R111 


R112 


R113 


R114 


R115 


R116 


R117 


R118 


R119 


R120 


R121 


R122 


R123 


R124 


R125 


R126 


R127 


R128 


R129 


R130 


R131 


R132 


R133 


R134 


R135 


R136 


R137 
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PROVISIO 








DATTTA | 






CCA I 




46 TCA NOT R147 TTG I 






TAT 
























AAT 


162 AAA NOT (R163 AAT AND R164 AGY) 




164 TCA NOT (R165 ACC AND R166 AAY) 


I65ACC NOT R166AAT 


AAT 


I67 AAA R168 NOT TCA OR R1 68 NOT AGC 






I70 GTA NOT R171TTA 




1 72 AAT NOT R173 ACA 


I73 (ACC OR ACG) NOT (R174 CAA AND 








AVOI 






NOT 




IFR1 






i NOT 
























NOT 


IFR1 




IF R' 


IFR' 


NOT 


IF R' 






IFR' 




IFR' 


IFR' 


UIPAC code 


GTN I 


GAR 


AAY 


TAY 


TTR orCTN 


NOV 


CCN 


ATG 


AGY orTCN 


TTR orCTN 


i GCN 


TAY 


TGG 


TTY 


ATG 


GAY 


GAY 


GGN 


GGN 


AAR 


TGG 


GAY 


TAY 


AAY 


AAR 


AAY 


AGYorTCN 


NOV 


AAY 


AAR 


AGY orTCN 


ATH 


GTN 


TTR orCTN 


AAY 


NOV 


Possible trinucleotides 


GTA GTC GTG GTT 


GAAGAG 


AAC AAT 


TACTAT 


TTA TTG CTA CTC CTG CTT 


ACA ACC ACG ACT 


CCA CCC CCG CCT 


ATG 


AGC AGT TCA TCC TCG TCT 


TTA TTG CTA CTC CTG CTT 


I GCAGCC GCG GCT 


; TACTAT 


TGG 


' TTCTTT 


ATG 


GAC GAT 


GAC GAT 


GGAGGC GGG GGT 


GGAGGC GGG GGT 


AAAAAG 


TGG 


GAC GAT 


TACTAT 


AACAAT 


AAAAAG 


AACAAT 


AGC AGT TCA TCC TCG TCT 


ACA ACC ACG ACT 


AACAAT 


AAAAAG 


AGC AGT TCA TCC TCG TCT 


ATAATCATT 


GTA GTC GTG GTT 


TTA TTG CTA CTC CTG CTT 


AACAAT 


ACA ACC ACG ACT 


AA | 


> 


LU 




>• 


— i 


i- 


Q. 




CO 


_j 


< 


> 




u_ 




Q 


Q 


CD 


CD 






Q 


>- 


z 




z 


CO 


\— 


z 


^ 


CO 




> 


_j 




i— 


inucleotlde | 


CO 

CO 


o> 

CO 


o 


T— 


CM 


co 




in 


CO 




GO 




o 
in 


m 


CM 

in 


CO 

in 


in 


m 
m 


CO 

m 


in 


GO 

in 


CD 

in 


o 

CO 


t — 

CD 


CM 
CO 


CO 
CO 


CO 


in 

CO 


CD 
CD 


CD 


CO 
CD 


o> 

CD 


o 
r- 




CM 


co 




cc 


CC 


cc 


cc 


CC 


CC 


cc 


CC 


CC 


CC 


CC 


CC 


a: 


01 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


cc 


CC 


CC 


CC 


CC 


CC 


CC 


CC 


cc 


OC 
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PROVISIO 


R175TCN) 




AVOID ATTTA I 














IF R182 GAA NOT (R183 TAT AND R1 84 TTR) 


AVOID ATTTA 








IF R187 GGA NOT (R188 TTG AND R189 CGN) 




IF R189 CGC NOT R190 MT 


NOTMT J 




IF R192 TTC NOT (R193 CM AND R194 TTR) 


IF R193 CM NOT R194 TTA 




IF R195 MT NOT R196 TGC 








NOT AM 


IF R200 ATT NOT R201 MT 
NOT ATA 


NOTMT | 


IF R202 AM NOT R203 MT 


NOTMT I 




NOT CCA 

IF R205 CCG NOT R206 ATA 


I IF R206 ATA NOT R207 ATA I 


UIPAC code 




CAR 


AGYorTCN 


TTY 


NOV 


TTY 


GAR | 


GAR 


GTN i 


GAR 


TAY I 


TTR or CTN 


GTN ! 


AAR 


GGN 


TTR or CTN 


AGR or CGN 


MY 


MR 


TTY 


CAR 


TTRorCTN 


MY 


TGY 


TAY 


GTN 


MR 


ATH 


AW 


MR 


MY 


MR 


CCN 


HIV 


Possible trinucleotides 




CAACAG 


AGC AGT TCA TCC TCG TCT 


TTCTTT 


ACAACCACGACT 


TTCTTT 


GAA GAG I 


GAA GAG I 


GTAGTCGTGGTT 


GAA GAG 


TAC TAT 


TTA TTG CTA CTC CTG CTT 


GTAGTCGTGGTT 


AAAMG 


GGAGGCGGG GGT 


TTA TTG CTA CTC CTG CTT 


AGA AGG CGA CGC CGG CGT 


MCMT 


MAMG 


TTCTTT 


CMCAG 


TTA TTG CTA CTC CTG CTT 


MCMT 


TGCTGT 


TAC TAT 


GTAGTCGTGGTT 


MAMG 


ATAATCATT 


MCMT 


MAMG 


MCMT 


MAMG 


CCACCC CCG CCT 


ATAATCATT 


AA 1 




O 


CO 




i- 


LL. 


in 


ID 


> 


in 


>- 


_i 


> 




O 


_i 


on 


z 




LL. 


O 


_i 


z 


O 


>- 


> 






z 


* 


z 








Trinucleotide 1 




R174 


R175 


R176 


R177 


R178 


CD 
N- 


R180 


R181 


R182 


R183 


R184 


R185 


R186 


R187 


R188 


R189 


R190 


R191 


R192 


R193 


R194 


R195 


R196 


R197 


R198 


R199 


R200 


R201 


R202 


R203 


R204 


R205 


R206 
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PROVISIO 


IF R207 ATA NOT R208 TAC 
NOT ATT 














AVOID ATTTA 


IF R21 5 (TTA OR CTA) NOT R21 6 ATA ! 








AVOID ATTTA I 


IF R220 (TTA OR CTA) NOT R221 ATT 
IF R220 (TTA OR CTA) NOT R221 ATC 


IF R221 ATT NOT R222 AAA 
NOT ATA 






AVOID ATTTA 














| IF R231TAT NOT R232 AAA I 


C! 
t 

C 

c 
c 
Q 

1- 
C 

o 
c 
o 
Q 
a 


) 

? 
■> 

4 

) 
w 

-> 




IF 234 CCC NOT R235 AAT 


IF R235 AAT NOT R236ACA 
IF R235 AAT NOT R236 ACT 


IF R236 ACA NOT (R237 ATA AND R238 AGY) 






UIPAC code 


ATH 


TAY 


ATH 


GAY 


AGYorTCN 


ATG 


AGYorTCN 


TAY 


TTR orCTN 


ATH 


All 


TAY 


AAY 


TTRorCTN 


ATH 


AAR 


CCN 


TAY 


TTR orCTN 


ATH 


CCN 


CAR 


ATG 


I ATG 


TAY 


AAR 


TTRorCTN 


I CCN 


AAY 


NOV 


ATH 


AGYorTCN 


Possible trinucleotides 


ATA ATC ATT 


TAC TAT 


ATA ATC ATT 


GAC GAT 


AGC AGT TCA TCC TCG TCT 


ATG 


AGC AGT TCA TCC TCG TCT 


TAC TAT 


TTA TTG CTA CTC CTG CTT 


ATA ATC ATT 


TTCTTT 


TAC TAT 


AACAAT 


TTA TTG CTA CTC CTG CTT 


ATA ATC ATT 


AAAAAG 1 


CCA CCC CCG CCT 


TAC TAT ' 


110 910 010 VIO 911 Vll 


ATA ATC ATT 


CCA CCC CCG CCT 


CAACAG 


ATG 


ATG 


| TAC TAT 


e 
< 




TTA TTG CTA CTC CTG CTT 


| CCA CCC CCG CCT 


AAC AAT 


ACAACCACGACT 


I ATA ATC ATT 


I AGC AGT TCA TCC TCG TCT 


AA 




>- 




Q 


CO 




CO 


>- 


_i 




U- 


>- 


2 








CL 


>- 


_i 




Q- 


O 












CL 


Z j 


(— 




co 


Trinucleotide 


R207 


R208 


R209 


R210 


R211 


R212 


R213 


R214 


R215 


R216 


R217 


R218 


R219 


R220 


R221 


R222 


R223 


R224 


R225 


1 R226 


R227 


R228 


R229 


R230 


R231 


R232 


R233 


R234 


R235 


| R236 


R237 


| R238 
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O 
CO 
> 
O 

a: 
a. 



o 

no 

o 
o 

a. 

3 



o 

>- 

s 



o 

o 



a: 



o 
© 

o 

3 



CD 

o 
I- 

o 
< 



(A 
(/> 
O 

a. 



< 
< 



CO 



LU 



o 

o 

< 
o 



o 

CD 

a 

o 
l- 
o 

o 
o 

1= 



-o 



o 



CO 
CM 

a: 



CO 

I 



1 



a 
o 

"So 



LU 



a c0 



g 
> 

s 



3 

Q_ 



© 

a 

o 

bJ 



ca 
o 
p 
a> 

o 
3 



2 



CD 



,CM 

a: a: 



a: 



a: 



3 



3 



a: 



8 

a 



cd 



cd 



DC 

3 



CD 



CD 



O 
a 

5 



CD 

S3 
g 



a 
o 

o 
a 
a 

5 



CD 



CM 

a:|aE 



a: 



DC 



2 



CO 
CM 

a: 



CM 

a: 
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Exemplified l-Scel 
(SEQID No 4) 


AGC 


AAG 


CTC 


CTG 


AAG 


GAG 


TAC 


AAG 


AGC 


CAG 


CTG 


ATC J 


GAA 


CTG 


OW 


ATC j 


GAG 


CAG I 


TTC 


I GAA I 


I GCT I 


GGC 


ATC 


GGC 


CTG 


ATC 


CTG 


GGC 


GAT 


GCC 


TAC 


ATC 


AGA 


TCC 


CGG 


PROVISIO 


































IF R41 GAG NOT R42 CAA 






































UIPAC 


AGC orTCM 


AAR 


CTS 


CTS 


AAR i 


GAG 


TAC 


AAR 


AGC orTCM 


CAR 


CTS 


ATY 


GAR 


CTS 


OW 


ATY 


GAR 


CAR 


Oil 


I GAR 


I GCY 


CD 
CD 


ATC 


GGM 


CTS 


ATY 


CTS 


GGM 


GAY 


GCY 


TAC 


ATY 


AGA or CGS 


AGCorTCM 


AGA or CGS 


Choices 


AGCTCATCC 


AAA AAG 


CTC CTG 


CTC CTG 


AAA AAG 


GAG 


TAC 


AAA AAG 


AGCTCATCC 


CAA CAG 


CTC CTG 


ATC ATT 


GAA GAG 


CTC CTG 


AAC 


ATC ATT I 


GAA GAG 


CAA CAG 


TTC 


GAA GAG 


GCC GCT 


GGC GGA 


ATC 


GGC GGA 


CTC CTG 


I ATC ATT 


CTC CTG 


I GGC GGA 


GAC GAT 


GCC GCT 


TAC 


ATC ATT 


AGACGC CGG 


AGCTCATCC 


AGACGCCGG 


AA 


CO 




_i 






UJ 


>- 




CO 


O 


_i 




LU 


_ j 


z 




UJ 


O 


U- 


LU 


< 


CD 




CD 


_i 




_ j 


CD 


o 


< 


> 




or 


CO 


or 


Trinucleotide 


R25 


R26 


R27 


R28 


R29 


R30 


R31 


R32 


R33 


R34 


R35 


R36 


R37 


R38 


R39 


R40 


R41 


R42 


R43 


R44 


R45 


R46 


R47 


R48 


R49 


R50 


R51 


R52 


R53 


R54 


R55 


R56 


R57 


R58 


R59 
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Exemplified l-Scel 
(SEQID No 4) 


GAC 


GAA . 


GGC 


AAG 


ACC 


TAC 


TGC 


ATG 


CAG 


Oil 


GAG 


TGG 


AAG 


OW 


AAG 


GCC 


TAC 


ATG 


GAC 


CAC 


GTG 


TGT 


CTG 


CTG 


TAC 


GAC 


CAG 


TGG 


GTC 


CTG 


AGC 


CCT 


CCT 


CAC 


I AAG I 


PROVISIO 




















































IF R85 GAC NOT R86 CAA 
















IF R93 CAT NOT R94 AAA 


































































31-TCM 










UIPAC 


GAY 


GAR 


GGM 


AAR 


ACY 


TAC 


TGY 


ATG 


CAG 


011 


GAR 


TGG 


AAR 


AAC 


AAR 


GCY 


TAC 


ATG 


GAY 


CAY 


GTS 


TGY 


CTS 


CTS 


TAC 


GAY 


CAR 


TGG 


GTS 


CTS 


AGCi 


CCH 


CCH 


CAY 


AAR 


Choices 


GAC GAT 


GAA GAG 


GGC GGA 


AAA AAG 


ACC ACT 


TAC 


TGC TGT 


ATG 


CAG 


TTC 


GAA GAG 


TGG 


AAA AAG 


AAC 


AAA AAG 


GCC GCT 


TAC 


ATG 


GAC GAT 


CAC CAT 


GTC GTG 


TGC TGT 


CTC CTG 


CTC CTG 


TAC 


GAC GAT 


CAA CAG 


TGG 


GTC GTG 


CTC CTG 


AGCTCATCC 


CCACCC CCT 


CCACCC CCT 


CAC CAT 


AAA AAG 


AA 


Q 


UJ 


o 




t- 


>- 


o 




O 


IX. 


LU 






z 




< 


>- 




Q 


X 


> 


o 


_i 


_i 


>- 


Q 


O 




> 




CO 


o_ 


D_ 


X 




Trinucleotide 


o 

CD 
DC 


R61 


R62 


CO 

co 
CC 


CD 
CC 


m 
co 
CC 


R66 


r- 
co 
CC 


00 
CO 

cc 


O) 
CO 

cc 


R70 


cc 


CM 

cc 


CO 

CC 


R74 


CC 


R76 


R77 


R78 


R79 


R80 


— 

00 

CC 


R82 


R83 


R84 


R85 


R86 


R87 


R88 


R89 


o 
cc 


R91 


R92 


R93 


R94 
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Exemplified l-Scel 
(SEQ ID No 4) 


AAG 


GAG 


CGC J 


GTG i 


MC 


CAT 


CTG 


GGC 


MC 


CTC 


GTG 


ATC 


ACC 


TGG 


GGA 


GCC 


CAG 


ACC 


TTC 


MG 


CAC 


CAG 


I GCC 


Oil 


MC 


MG 


CTG 


GCC 


MC 


CTG I 


TTC 


ATC 


GTG 


MC I 


MC 


PROVISIO 








































IF R114 MG NOT R115 CAT 
































UIPAC 


AAR 


GAR 


AGA or CGS 


GTS 


MC I 


CAY 


CTS 


O 

o 


MC 


CTS 


GTS 


ATY 


ACY 


TGG 


GGM 


GCY 


CAR 


ACY 


TTC 


MR 


CAY 


CAR 


GCY 


Oil | 


MC 


MR 


CTS 


GCS 


MC 


CTS 


TTC 


ATY 


CTS 


MC 


MC 


Choices 


AMMG 


GMGAG 


AGACGC CGG 


GTC GTG 


MC 


CAC CAT 


CTC CTG 


GGC GGA I 


MC 


CTC CTG 


GTC GTG 


ATC ATT 


ACC ACT 


TGG 


GGC GGA 


GCC GCT 


CM CAG 


ACC ACT 


TTC 


AMMG 


CAC CAT 


CM CAG 


GCC GCT 


TTC 


MC 


AMMG 


CTC CTG 


GCC GCT 


MC 


I CTC CTG 


I TTC 


ATC ATT 


GTC GTG 


MC 


MC 


AA 




UJ 


DC 


> 


z 


X 


_j 


o 


Z 


_i 


> 




h- 




O 


< 


O 




LL. 






a 


< 


LL. 


z 




-J 


< 


Z 


_j 


LL. 




> 


z 


z 


Trinucleotide 


R95 


to 
o> 

r 


R97 


R98 1 


R99 


R100 


R101 


R102 


R103 


R104 


R105 


R106 i 


R107 


R108 


R109 


R110 


R111 


R112 


R113 


R114 


R115 


R116 


R117 


R118 


I R119 


R120 


R121 


R122 


R123 


I R124 


R125 


I R126 


R127 


R128 


R129 
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Exemplified l-Scel 
(SEQID No 4) 


AAG 




OOV 


ATC 


CCC 


AAC _J 


AAC 


010 


GTG 


GAG 


AAC 


TAC 


CTC 


ACT 


CCC 


ATG 


AGC 


CTG 


GCC 


TAC 


TGG 1 




ATG 


GAC 


GAC 


GGA 


GGC 


AAG 


TGG 


GAC 


TAC 


AAC 


AAG 


AAC 


I AGC I 


PROVISIO 






































































IF R164 TCA NOT R165 ACC 




































rTCM 




































DrTCM 


UIPAC 


AAR 


MR 


ACY 


ATY 


CCH 


AAC 


AAC 


CTS 


GTS 


GAR 


AAC 


TAC 


CTS 


ACY 


CCY 


ATG 


AGCo 


CTS 


GCY 


TAC 


TGG 


TTC 


ATG 


GAY 


GAY 


GGM 


1 GGM 


AAR 


TGG 


GAY 


TAC 


AAC 


AAR 


AAC 


AGC i 


Choices 


AAA AAG 


AAA AAG 


ACC ACT 


ATC ATT \ 


f— 
o 
o 
o 
o 
o 

5 

o 


AAC 


AAC 


CTC CTG 


GTC GTG i 


GAAGAG 


AAC 


TAC 


CTC CTG 


ACC ACT 


O 

o 
o 
a 


ATG 


AGCTCATCC 


CTC CTG 


GCC GCT 


TAC 


TGG 


1 TTC 


ATG 


GAC GAT 


GAC GAT 


GGC GGA 


GGC GGA 


AAAAAG 


TGG 


GAC GAT 


TAC 


AAC 


AAAAAG 


AAC 


AGCTCATCC 


AA 










Q- 


z 


z 


_ i 


> 


LU 


z 


>- 


-j 


t- 


CL 




CO 


_i 


< 


>- 




LL 




a 


a 


O 


O 






a 


>- 


z 




z 


CO 


Trinucleotide 


c 

<r. 

a. 


! 00 
: £ 


CM 
CO 


£2 
co 

. 5 


co 

. 5 


LC 
CD 

£ 


CD 

co 

: 5 


CO 

: <x 


CO 

> co 
: QL 


Oi 
CO 

: q£ 


a 
5 


: a: 


tr 


: a: 


a: 


in 

cc 


CO 

OC 


t-- 
•t 

: £ 


ao 

s 


05 
"* 

5 


a 
in 

5 


m 

: £ 


CM 

in 


CO 
IT. 

£ 


in 
£ 


ia 
in 

£ 


CD 

in 

a: 


in 
CK 


CO 

m 
£ 


03 

in 
£ 


o 

CD 

£ 


R161 


CM 
CO 

£ 


CO 
CO 

on 


R164 
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Exemplified l-Scel 
(SEQID No 4) 


ACC 


OW 


AAG 


TCA 


< 


GTG 


CTG 


AAC 


ACC 


CAA 


AGC 


O 


ACC 




GAA 


GAA 


GTG 


GAG 


TAC 


CTC 


GTC 


AAG 


GGC 


CTG 


CGC 


AAC 


AAG 


Oil 


CAG 


CTG 


AAC 


TGC 


I TAC I 


PROVISIO 






IF R167 AAA R168 NOT TCA 
OR R168 NOT AGC 












IF R173 ACC NOT(R174CAA 
AND R175TCN) 


















































UIPAC 


ACY 


AAC 


AAR 


AGCorTCM 


ATY 


GTS 


CTS 


AAC 


ACY 


CAR 


AGC orTCM 


TTC 


ACY 


Oil 


GAR 


GAR 


GTS 


GAR 


TAC 


CTS 


GTS 


AAR 


GGM 


CTS 


AGAorCGS 


AAC 


AAR 


Oil 


CAR 


CTS 


AAC 


TGY 


TAC 


Choices 


ACC ACT 


OW 


AAA AAG 


AGCTCATCC 


ATCATT I 


GTC GTG 


CTC CTG 


AAC 


ACC ACT 


CAA CAG 


AGCTCATCC l 


TTC 


ACC ACT 


Oil 


GAA GAG 


GAA GAG 


GTC GTG 


GAA GAG 


TAC 


CTC CTG 


GTC GTG 


AAA AAG 


GGC GGA 


CTC CTG 


AGACGC CGG 


AAC 


AAA AAG 


TTC 


CAACAG 


I CTC CTG 


AAC 


I TGC TGT 


TAC 


AA 


h- 


z 




CO 




> 




z 




a 


CO 


u. 


t- 


Li. 


LU 


LU 


> 


LU 


>- 


_j 


> 




O 


_i 


01 


z 




U- 


O 




z 


o 


>- 


Trinucleotide 


m 
5 


CD 
CD 


R167 


R168 


OJ 
CD 

5 


o 

S 


5 


R172 


R173 


R174 


in 
r- 

5 


R176 


R177 


R178 


R179 


R180 


R181 


R182 


R183 


R184 


R185 


R186 


R187 


R188 


R189 


R190 


R191 


R192 


R193 


R194 


R195 


R196 


R197 
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Exemplified l-Scel 
(SEQ ID No 4) 


GTG 


AAG 


ATC 


AAC 


AAG 


OW 


AAG 


CCT 


ATC 


ATC 


TAC 


ATC 


GAC 


AGC 


ATG 


AGC 


TAC 


CTG 


ATC 


TTC 


TAC 


AAC 


CTG 


ATC I 


AAG 


CCA 


TAC 


CTG 


ATC l 


CCT 


CAG 


ATG 


ATG 


TAC 


AAG | 


PROVISIO 
















































IF R221 ATT NOT R222 AAA 
























UIPAC 


GTS 


AAG 


ATY 


AAC 


AAR 


AAC 


AAR 


CCY 


ATY 


ATC 


TAC 


ATY 


GAY 


AGC orTCM 


ATG 


AGC orTCM 


TAC 


CTS 


ATY 


TTC 


TAC 


OW 


CTS 


ATY 


AAR 


CCH 


TAC 


CTS 


ATY 


CCH 


CAR 


ATG 


ATG 


TAC 




Choices 


GTC GTG 


AAG 


ATC ATT 


AAC 


AAA AAG 


AAC 


AAA AAG 


100 000 


ATC ATT 1 


ATC 


TAC 


ATC ATT 


GAC GAT 


AGCTCATCC 


ATG 


AGCTCATCC 


TAC 


CTC CTG 


ATC ATT 


Oil 


TAC 


AAC 


CTC CTG 


ATC ATT 


AAA AAG 


CCACCC CCT 


TAC 


CTC CTG 


ATC ATT 


CCACCC CCT 


CAACAG 


ATG 


ATG 


TAC 


AAA AAG 


AA 


> 






z 




z 




Q- 






>• 




Q 


CO 




CO 


>- 


_ i 






>■ 


z 


_j 






CL 


>- 


_i 






O 






>- 




Trinucleotide 


R198 


R199 ! 


R200 


R201 


R202 


R203 


R204 


R205 


R206 


R207 


R208 


R209 


R210 


R211 


R212 


R213 


R214 


R215 


R216 


R217 


R218 


R219 


R220 


R221 


R222 


R223 


R224 


R225 


R226 


R227 


R228 


R229 


R230 


R231 


R232 
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Examples 

Example I: Design, synthesis and analysis of a plant expressible chimeric gene 
encoding I-Scel. 

The coding region of I-Scel wherein the 4 aminoterminal amino acids have been replaced by 
a nuclear localization signal was optimized using the following process: 

1. Change the codons to the most preferred codon usage for maize without altering the 
amino acid sequence of I-Scel protein, using the Synergy Geneoptimizer™; 

2. Adjust the sequence to create or eliminate specific restriction sites to exchange the 
synthetic I-Scel coding region with the universal code I-Scel gene; 

3. Eliminate all GC stretches longer than 6 bp and AT stretches longer than 4 bp to 
avoid formation of secondary RNA structures than can effect pre-mRNA splicing 

4. Avoid CG and TA duplets in codon positions 2 and 3; 

5. Avoid other regulatory elements such as possible premature polyadenylation signals 
(GATAAT, TATAAA, AATATA AATATT, GATAAA, AATGAA, AATAAG, 
AATAAA AATAAT, AACCAA, ATATAA, AATCAA, ATACTA, ATAAAA, 
ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, 
AATACA and CATAAA), cryptic intron splice sites (AAGGTAAGT and 
TGCAGG), ATTTA pentamers and CCAAT box sequences (CCAAT, ATTGG, 
CGAAT and ATTGC); 

6. Recheck if the adapted coding region fulfill all of the above mentioned criteria. 

A possible example of such a nucleotide sequence is represented in SEQ ID No 4. A 
synthetic DNA fragment having the nucleotide sequence of SEQ ID No 4 was synthesized 
and operably linked to a CaMV35S promoter and a CaMV35S 3' termination and 
polyadenylation signal (yielding plasmid pCV78; SEQ ID No 7). 

The synthetic I-Scel coding region was also cloned into a bacterial expression vector (as a 
fusion protein allowing protein enrichment on amylose beads). The capacity of semi-purified 
I-Scel protein to cleave in vitro a plasmid containing an I-Scel recognition site was verified. 
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Example 2. Isolation of maize cell lines containing a promoterless bar gene preceded by 
an I-Scel site. 

In order to develop an assay for double stranded DNA break induced homology-mediated 
recombination, maize cell suspensions were isolated that contained a promoterless bar gene 
preceded by an I-Scel recognition site integrated in the nuclear genome in single copy. Upon 
double stranded DNA break induction through delivery of an I-Scel endonuclease encoding 
plant expressible chimeric gene, and co-delivery of repair DNA comprising a CaMV 35S 
promoter operably linked to the 5'end of the bar gene, the 35S promoter may be inserted 
) through homology mediated targeted DNA insertion, resulting in a functional bar gene 

allowing resistance to phosphinotricin (PPT). The assay is schematically represented in 
Figure 1. 

The target locus was constructed by operably linking through conventional cloning 
techniques the following DNA regions 

a) a 3' end termination and polyadenylation signal from the nopaline synthetase gene 

b) a promoter-less bar encoding DNA region 

c) a DNA region comprising an I-Scel recognition site 

d) a 3' end termination and polyadenylation signal from AJumefaciens gene 7 (3'g7) 

e) a plant expressible neomycin resistance gene comprising a nopaline synthetase promoter, 
' a neomycine phosphotransferase gene, and a 3' ocs signal. 

This DNA region was inserted in a T-DNA vector between the T-DNA borders. The T-DNA 
vector was designated pTTAM78 (for nucleotide sequence of the T-DNA see SEQ ID No 5) 

The T-DNA vector was used directly to transform protoplasts of corn according to the 
methods described in EP 0 469 273, using a He89-derived corn cell suspension. The T-DNA 
vector was also introduced into Agrobacterium tumefaciens C58ClRif(pEHA101) and the 
resulting Agrobacterium was used to transform an He89-derived cell line. A number of target 
lines were identified that contained a single copy of the target locus construct pTTAM78, 
such as T24 (obtained by protoplast transformation) and lines 14-1 and 1-20 (obtained by 
Agrobacterium mediated transformation) 
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Cell suspensions were established from these target lines in N6M cell suspension medium, 
and grown in the light on a shaker (120 rpm) at 25°C. Suspensions were subcultured every 
week. 

Example 3: Homology based targeted insertion. 

The repair DNA pTTA82 is a T-DNA vector containing between the T-DNA borders the 
following operably linked DNA regions: 

a) a DNA region encoding only the aminoterminal part of the bar gene 

b) a DNA region comprising a partial I-Scel recognition site (13 nucleotides located at the 
5' end of the recognition site) 

c) a CaMV 35S promoter region 

d) a DNA region comprising a partial I-Scel recognition site (9 nucleotides located at the 3' 
end of the recognition site) 

e) a 3' end termination and polyadenylation signal from A.tumefaciens gene 7 (3'g7) 

f) a chimeric plant expressible neomycine resistance gene 

g) a defective I-Scel endonuclease encoding gene under control of a CaMV 35S promoter 

The nucleotide sequence of the T-DNA of pTTA82 is represented in SEQ ID NO 6. 

This repair DNA was co-delivered with pCV78 (see Example 1) by particle bombardment 
into suspension derived cells which were plated on filter paper as a thin layer. The filter 
paper was plated on Mahql VII substrate. 

The DNA was bombarded into the cells using a PDS-1000/He Biolistics device. Microcarrier 
preparation and coating of DNA onto microcarriers was essentially as described by Sanford 
et al 1992. Particle bombardment parameters were: target distance of 9cm; bombardment 
pressure of 1350 psi, gap distance of V4" and macrocarrier flight distance of 11 cm. 
Immediately after bombardment the tissue was transferred onto non-selective MhilVII 
substrate. As a control for successful delivery of DNA by particle bombardment, the three 
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target lines were also bombarded with microcarriers coated with plasmid DNA comprising a 
chimeric bar gene under the control of a CaMV35S promoter (pRVA52). 

Four days after bombardment, the filters were transferred onto Mhl VII substrate 
supplemented with 25 mg/L PPT or on Ahxl.5VIIinol000 substrate supplemented with 50 
mg/L PPT. 

Fourteen days later, the filters were transferred onto fresh Mhl VII medium with 10 mg/L 
PPT for the target lines T24 and 14-1 and Mhl VII substrate with 25 mg/L PPT for target 
line 1-20. 

Two weeks later, potential targeted insertion events were scored based on their resistance to 
PPT. These PPT resistant events were also positive in the Liberty Link Com LeaffSeed test 
(Strategic Diagnostics Inc.). 



Number of PPT resistant calli 38 days after bombardment: 



Target line 


pRVA52 


P TTA82+pCV78 




Total number of 
PPT R events 


Mean number of 
PPT R 

events/petridish 


Total number of 
PPT R events 


Mean number of 

ppjR 

events/petridish 


1-20 


75 


25 


115 


7.6 


14-1 


37 


12.3 


38 


2.2 


24 


40 


13.3 


2 


0.13 



The PPT resistant events were further subcultured on Mhl VII substrate containing 10 mg/L 
PPT and callus material was used for molecular analysis. Twenty independent candidate TSI 
were analyzed by Southern analysis using the 35S promoter and the 3' end termination and 
polyadenylation signal from the nopaline synthase gene as a probe. Based on the size of the 
expected fragment, all events appeared to be perfect targeted sequence insertion events. 
Moreover, further analysis of about half of the targeted sequence insertion events did not 
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show additional non-targeted integration of either the repair DNA or the I-Scel encoding 
DNA. 

Sequence analysis of DNA amplified from eight of the targeted insertion events 
demonstrated that these events were indeed perfect homologous recombination based TSI 
events. 

Based on these data, the ratio of homologous recombination based DNA insertion versus the 
"normal" illegitimate recombination varies from about 30% for 1-20 to about 17% for 14-1 
and to about 1% for 24. 

When using vectors similar to the ones described in Puchta et al, 1996 (supra) delivered by 
electroporation to tobacco protoplasts in the presence of I-Scel induced double stranded 
DNA breaks, the ratio of homologous recombination based DNA insertion versus normal 
insertion was about 15%. However, only one of out of 33 characterized events was a 
homology-mediated targeted sequence insertion event whereby the homologous 
recombination was perfect at both sides of the double stranded break. 

Using the vectors from Example 2, but with a "universal code I-Scel construct" comprising a 
nuclear localization signal, the ratio of HR based DNA insertion versus normal insertion 
varied between 0.032% and 16% for different target lines, both using electroporation or 
Agrobacterium mediated DNA delivery. The relative frequency of perfect targeted insertion 
events differed between the different target lines, and varied from 8 to 70% for 
electroporation mediated DNA delivery and between 73 to 90% for Agrobacterium mediated 
DNA delivery. 

Example 4. Acetosyringone pre-incubation improves the frequency of recovery of 
targeted insertion events. 

One week before bombardment as described in Example 3, cell suspensions were either 
diluted in N6M medium or in LSIDhyl.5 medium supplemented with 200 uM 
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acetosyringone. Otherwise, the method as described in Example 3 was employed. As can be 
seen from the results summarized in the following table, preincubation of the cells to be 
transformed with acetosyringone had a beneficial effect on the recovery of targeted PPT 
resistant insertion events. 



Target line 


Preincubation with acetosyringone 


No preincubation 




Total number of 
PPT R events 


Mean number of 

pp T R 

events/petridish 


Total number of 
PPT R events 


Mean number of 
PPT R 

events/petridish 


1-20 


89 


7.6 


26 


3.7 


14-1 


32 


3.6 


6 


0.75 


24 


0 


0 


2 


0.3 



Example 5: DSB-mediated targeted sequence insertion in maize by Agrobacterium- 
mediated delivery of repair DNA. 

To analyze DSB-mediated targeted sequence insertion in maize, whereby the repair DNA is 
delivered by Agrobacterium-mediated transformation, T-DNA vectors were constructed 
similar to pTTA82 (see Example 3), wherein the defective I-Scel was replaced by the 
synthetic I-Scel encoding gene of Example 1. The T-DNA vector further contained a copy of 
the Agrobacterium tumefaciens virG and virC (pTCV83) or virG, virC and virB (pTCV87) 
outside the T-DNA borders. These T-DNA vectors were inserted into LBA4404, containing 
the helper Ti-plasmid pAL4404, yielding Agrobacterium strains A4995 and A 4996 
respectively. 

Suspension cultures of the target cell lines of Example 2, as well as other target cell lines 
obtained in a similar way as described in Example 2, were co-cultivated with the 
Agrobacterium strains, and plated thereafter on a number of plates. The number of platings 
was determined by the density of the cell suspension. As a control for the transformation 
efficiency, the cell suspension were co-cultivated in a parallel experiment with an 
Agrobacterium strain LBA4404 containing helper Ti-plasmid pAL4404 and a T-DNA vector 
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with a chimeric phosphinotricin resistance gene (bar gene) under control of a CaMV 35S 
vector. The T-DNA vector further contained a copy of the Agrobacterium tumefaciens virG, 
virC and virB genes, outside the T-DNA border. The results of four different independent 
experiments are summarized in the tables below: 



Agrobacterium experiment I: 





Control 


A4495 


Target line 












N° of platings 


N° of 
transformants 


N° of platings 11 ' 


N° of TSI events 


T24 


26 


10 


32 


0 


T26 


36 


44 


36 


1 


14-1 


20 


18 


28 


0 


Tl F155 


26 


7 


24 


0 



SigrtSUlH-lcl turn- e-vftsi 

Target line 


Control 


A4' 


195 




N° of platings 


N°of 
transformants 


N°ofplatings lu 


N° of TSI events 


1-20 


18 


-200 


27 


11 


T79 


24 


-480 


24 


6 


T66 


26 


73 


31 


0 


T5 


28 


35 


18 


0 


Tl F154 


22 


65 


16 


1 
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Agrobacterium experiment HI: 



Target line 


Control 


A4496 




N° of platings 


N°of 
transform ants 


N° of platings 0 * 


N° of TSI events 


T24 


50 


-2250 


30 


1 


T26 


44 


-220 


32 


1 


14-1 


20 


-1020 


13 


1 


T1F155 


33 


-1870 


32 


0 



Agrobacterium experiment IV: 



Target line 


A3970 


A4496 




N° of platings 


N° of 
transform ants 


N° ofplatings tu 


N° of TSI events 


T1F154 






28 


1 


T5 


12 


-600 


28 


1 


T66 






28 


0 


T79 






24 


0 


1-20 


18 


-400 


40 


9 



Thus, it is clear that, while Agrobacterium-mediated repair DNA delivery is clearly feasible, 
the frequency of Targeted Sequence Insertion (TSI) events is lower in comparison with 
particle bombardment-mediated repair DNA delivery. Southern analysis performed on 23 
\ putative TSI events showed that 20 TSI events are perfect, based on the size of the fragment. 
However, in contrast with the events obtained by microprojectile bombardment as in 
Example 3, only 6 events out of 20 did not contain additional inserts of the repair DNA, 9 
events did contain 1 to 3 additional inserts of the repair DNA, and 5 events contained many 
additional inserts of the repair DNA. 

Particle bombardment mediated delivery of repair DNA also results in better quality of DSB 
mediated TSI events compaired to delivery of repair DNA by Agrobacterium. This is in 
contrast for particle bombardment mediated delivery of "normal transforming DNA" which 
is characterized by the lesser quality of the transformants (complex integration pattern) in 
comparison with Agrobacterium-mediated transformation. 
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This indicates that the quality of transformants obtained by particle bombardment or other 
direct DNA delivery methods can be improved by DSB mediated insertion of sequences. 
This result is also confirmed by the following experiment: upon DSB mediated targeted 
sequence insertion of a 35S promoter, in absence of flanking sequences with homology to 
the target locus in the repair DNA, we observed that upon electroporation-mediated delivery 
of repair DNA, only a minority of the TSI events did contain additional non-targeted 
insertions of 35S promoter (2 TSI events out of 16 analyzed TSI events show additional at 
random insertion(s) of the 35S promoter). In contrast random insertion of the 35S promoter 
was considerably higher in TSI events obtained by Agrobacterium mediated delivery of the 
35S promoter (17 out 22 analyzed TSI events showed additional at random insertion(s) of the 
35S promoter). 
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Example 6: Media composition 

MahqlVH: N6 medium (Chu et al. 1975) supplemented with lOOmg/L casein hydrolysate, 6 
mM L-proline, 0.5g/L 2-(N-morpholino)ethanesulfonic acid (MES), 0.2M mannitol, 0.2M 
sorbitol, 2% sucrose, lmg/L 2,4-dichlorophenoxy acetic acid (2,4-D), adjusted to pH5.8, 
solidified with 2,5 g/L Gelrite®. 

MhilVII: N6 medium (Chu et al. 1975) supplemented with 0.5g/L 2-(N- 
morpholino)ethanesulfonic acid (MES), 0.2M mannitol, 2% sucrose, lmg/L 2,4- 
dichlorophenoxy acetic acid (2,4-D), adjusted to pH5.8 solidified with 2,5 g/L Gelrite®. 

) 

MhlVII: idem to MhilVII substrate but without 0.2 M mannitol. 

Ahxl.5VHinol000: MS salts- supplemented with lOOOmg/L myo-inositol, 0.1 mg/L 
thiamine-HCl, 0.5mg/L nicotinic acid, 0.5mg/L pyridoxine-HCl, 0.5g/L MES, 30g/L 
sucrose, lOg/L glucose, 1.5mg/L 2,4-D, adjusted to pH 5.8 solidified with 2,5 g/L Gelrite®. 

LSIDhyl.5: MS salts supplemented with 0;5mg/L nicotinic acid, 0.5mg/L pyridoxine-HCl, 
lmg/L thiamine-HCl, lOOmg/L myo-inositol, 6mM L-proline, 0.5g/L MES, 20g/L sucrose, 
lOg/L glucose, 1.5mg/L 2.4-D, adjusted to pH 5.2. 

! N6M: macro elements: 2830mg/L KN0 3 ; 433mg/L (NHt)2S0 4 ; 166mg/L CaCl 2 .2H 2 0; 250 

mg/L MgSo 4 .7H 2 0; 400mg/L KH 2 P0 4 ; 37.3mg/L Na 2 EDTA; 27.3mg/L FeSo 4 .7H 2 0, MS 
micro elements, 500mg/L Bactotrypton, 0.5g/L MES, lmg/L thiamin-HCl, 0.5mg/L nicotinic 
acid; 0.5mg/L pyridoxin-HCl, 2mg/L glycin,' lOOmg/L myo-inositol, 3% sucrose, 0.5mg/L 
2.4-D, adjusted to pH5.8. 
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What is claimed is : 

1 . A method for introducing a foreign DNA of interest into a preselected site of a genome 
of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell ; 

(b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that the foreign DNA is delivered by direct DNA transfer. 

2. The method of claim 1 wherein said direct DNA transfer is accomplished by 
bombardment of microprojectiles coated with the foreign DNA of interest. 

3. The method of claim 1 or 2, wherein said foreign DNA of interest is flanked by a DNA 
region having at least 80% sequence identity to a DNA region flanking the preselected 
site. 

4. The method of any one of claims 1 to 3, wherein said double stranded DNA break is 
induced by introduction of a I-Scel encoding gene. 

5. The method of claim 4 wherein said I-Scel encoding gene comprises a nucleotide 
sequence encoding the amino acid sequence of SEQ ID No 1, wherein said nucleotide 
sequence has a GC content of about 50% to about 60%, provided that 

vii) said nucleotide sequence does not comprise a nucleotide sequence selected from the 
group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATTAAA, AATTAA, AATACA and CATAAA; 

viii) said nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

ix) said nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GC AGG; 
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x) said nucleotide sequence does not comprise a GC stretch consisting of 7 consecutive 
nucleotides selected from the group of G or C; 

xi) said nucleotide sequence does not comprise a AT stretch consisting of 5 consecutive 
nucleotides selected from the group of A or T; and 

xii) said nucleotide sequence does not comprise the codons TTA, CTA, ATA, GTA, 
TCG, CCG, ACG and GCG. 

6. The method of claim 5, wherein the I-Scel encoding gene comprises the nucleotide 
sequence ofSEQ ID 4. 

7. The method of any of the foregoing claims, whereby the plant cell is a maize cell. 

8. The method of claim 7, wherein the maize cell is comprised within a cell suspension. 

9. The method of any of the foregoing claims, whereby said plant cell is incubated in a 
plant phenolic compound prior to step a). 

10. The method of claim 9, wherein said plant phenolic compound is acetosyringone. 

1 1. A method for introducing a foreign DNA of interest into a preselected site of a genome 
of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell ; 

(b) introducing the foreign DNA of interest into the plant cell ; 

characterized in that the double stranded DNA break is introduced by a rare cutting 
endonuclease encoded by a nucleotide sequence wherein said nucleotide sequence has a 
GC content of about 50% to about 60%, provided that 

i) said nucleotide sequence does not comprise a nucleotide sequence selected from 
the group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATT AAA, AATTAA, AATACA and CATAAA; 
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ii) said nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

iii) said nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

iv) said nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; 

v) said nucleotide sequence does not comprise a AT stretch consisting of 5 
consecutive nucleotides selected from the group of A or T; and 

vi) said nucleotide sequence does not comprise the codons TTA, CTA, ATA, GTA, 
TCG, CCG, ACG and GCG. 

12. The method of claim 11, wherein the nucleotide sequence comprises the nucleotide 
sequence of SEQ ID 4. 

13. The method of claim 11 or 12, wherein the foreign DNA of interest is introduced into 
said plant cell by direct DNA transfer. 

14. The method of any one of claims 11 to 13, wherein said direct DNA transfer is 
accomplished by bombardment of microprojectiles coated with the foreign DNA of 
interest. 

15. The method of any one of claims 11 to 14, wherein said foreign DNA of interest is 
flanked by a DNA region having at least 80% sequence identity to a DNA region 
flanking the preselected site. 

16. The method of any one of claims 1 1 to 15, wherein said double stranded DNA break is 
induced by introduction of a I-Scel encoding gene. 

17. The method of any of the foregoing claims, whereby the plant cell is a maize cell. 



18. The method of claim 17, wherein the maize cell is comprised within a cell suspension. 
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19. The method of any of the foregoing claims, whereby said plant cell is incubated in a 
plant phenolic compound prior to step a). 

20. The method of claim 19, wherein said plant phenolic compound is acetosyringone. 

21. A method for introducing a foreign DNA of interest into a preselected site of a genome 
of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell; 

(b) introducing the foreign DNA of interest into the plant cell ; 

characterized in that prior to step a, the plant cells are incubated in a plant phenolic 
compound. 

22. The method according to claim 21, wherein said plant phenolic compound is selected 
from the group of acetosyringone (3,5-dimethoxy-4-hydroxyacetophenone), a-hydroxy- 
acetosyringone, sinapinic acid (3,5 dimethoxy-4-hydroxycinnamic acid), syringic acid 
(4-hydroxy-3,5 dimethoxybenzoic acid), ferulic acid (4-hydroxy-3-methoxycinnamic 
acid), catechol (1,2-dihydroxybenzene), p-hydroxybenzoic acid (4-hydroxybenzoic acid), 
P-resorcylic acid (2,4 dihydroxybenzoic acid), protocatechuic acid (3,4- 
dihydroxybenzoic acid), pyrrogallic acid (2,3,4 -trihydroxybenzoic acid), gallic acid 
(3,4,5-trihydroxybenzoic acid) and vanillin (3-methoxy-4-hydroxybenzaldehyde). 

23. A method for introducing a foreign DNA of interest into a preselected site of a genome 
of a plant cell comprising the steps of 

(a) inducing a double stranded DNA break at the preselected site in the genome of the 
cell by a rare cutting endonuclease ; 

(b) introducing the foreign DNA of interest into the plant cell ; 
characterized in that said endonuclease comprises a nuclear localization signal. 

24. An isolated DNA fragment comprising a nucleotide sequence encoding the amino acid 
sequence of SEQ ID No 1, wherein the nucleotide sequence has a GC content of about 
50% to about 60%, provided that 
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i) said nucleotide sequence does not comprise a nucleotide sequence selected from 
the group consisting of GATAAT, TATAAA, AATATA, AATATT, GATAAA, 
AATGAA, AATAAG, AATAAA, AATAAT, AACCAA, ATATAA, AATCAA, 
ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, 
ATT AAA, AATTAA, AATACA and CATAAA; 

ii) said nucleotide does not comprise a nucleotide sequence selected from the group 
consisting of CCAAT, ATTGG, GCAAT and ATTGC; 

iii) said nucleotide sequence does not comprise a sequence selected from the group 
consisting of ATTTA, AAGGT, AGGTA, GGTA or GCAGG; 

iv) said nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; 

v) said nucleotide sequence does not comprise a AT stretch consisting of 5 
consecutive nucleotides selected from the group of A or T; and 

vi) codons of said nucleotide sequence coding for Leu, He, Val, Ser, Pro, Thr, Ala do 
not comprise TA or GC duplets in positions 2 and 3 of said codons. 

25. An isolated DNA fragment comprising the nucleotide sequence of SEQ ID No 2, wherein 
the GC content of said nucleotide sequence is about 50 to about 60%, provided that 

i) said nucleotide sequence from position 28 to position 30 is not AAG; 

ii) if the nucleotide sequence from position 34 to position 36 is AAT then the 
nucleotide sequence from position 37 to position 39 is not ATT or ATA; 

iii) if the nucleotide sequence form position 34 to position 36 is AAC then the 
nucleotide sequence from position 37 to position 39 is not ATT simultaneously 
with the nucleotide sequence from position 40 to position 42 being AAA; 

iv) if the nucleotide sequence from position 34 to position 36 is AAC then the 
nucleotide sequence from position 37 to position 39 is not ATA; 

v) if the nucleotide sequence from position 37 to position 39 is ATT or ATA then 
the nucleotide sequence from position 40 to 42 is not AAA; 

vi) the nucleotide sequence from position 49 to position 51 is not CAA; 

vii) the nucleotide sequence from position 52 to position 54 is not GTA; 
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viii) the codons from the nucleotide sequence from position 58 to position 63 are 
chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; 

ix) if the nucleotide sequence from position 67 to position 69 is CCC then the 
nucleotide sequence from position 70 to position 72 is not AAT; 

x) if the nucleotide sequence from position 76 to position 78 is AAA then the 
nucleotide sequence from position 79 to position 81 is not TTG simultaneously 
with the nucleotide sequence from position 82 to 84 being CTN; 

xi) if the nucleotide sequence from position 79 to position 81 is TTA or CTA then 
the nucleotide sequence from position 82 to position 84 is not TTA; 

xii) the nucleotide sequence from position 88 to position 90 is not GAA; 

xiii) if the nucleotide sequence from position 91 to position 93 is TAT, then the 
nucleotide sequence from position 94 to position 96 is not AAA; 

xiv) if the nucleotide sequence from position from position 97 to position 99 is 
TCC or TCG or AGC then the nucleotide sequence from position 100 to 102 is 
not CCA simultaneously with the nucleotide sequence from position 103 to 105 
being TTR; 

xv) it the nucleotide sequence from position 100 to 102 is CAA then the nucleotide 
sequence from position 103 to 105 is not TTA; 

xvi) if the nucleotide sequence from position 109 to position 1 1 1 is GAA then the 
nucleotide sequence from 1 12 to 1 14 is not TTA; 

xvii) if the nucleotide sequence from position 115 to 117 is AAT then the 
nucleotide sequence from position 1 18 to position 120 is not ATT or ATA; 

xviii) if the nucleotide sequence from position 121 to 123 is GAG then the 
nucleotide sequence from position 124 to position 126; 

xix) the nucleotide sequence from position 133 to 135 is not GCA; 

xx) the nucleotide sequence from position 139 to position 141 is not ATT; 

xxi) if the nucleotide sequence from position 142 to position 144 is GGA then the 
nucleotide sequence from position 145 to position 147 is not TTA; 

xxii) if the nucleotide sequence from position 145 to position 147 is TTA then the 
nucleotide sequence from position 148 to position 150 is not ATA simultaneously 
with the nucleotide sequence from position 151 to 153 being TTR; 
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xxiii) if the nucleotide sequence from position 145 to position 147 is CTA then the 
nucleotide sequence from position 148 to position 150 is not ATA simultaneously 
with the nucleotide sequence from position 151 to 153 being TTR; 

xxiv) if the nucleotide sequence from position 148 to position 150 is ATA then the 
, nucleotide sequence from position 1 5 1 to position 1 53 is not CTA or TTG; 

xxv) if the nucleotide sequence from position 160 to position 162 is GCA then the 
nucleotide sequence from position 163 to position 165 is not TAC; 

xxvi) if the nucleotide sequence from position 163 to position 165 is TAT then the 
nucleotide sequence from position 166 to position 168 is not ATA simultaneously 
with the nucleotide sequence from position 169 to position 171 being AGR; 

xxvii) the codons from the nucleotide sequence from position 172 to position 177 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise GC AGG; 

xxviii) the codons from the nucleotide sequence from position 178 to position 186 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise AGGTA; 

xxix) if the nucleotide sequence from position 193 to position 195 is TAT, then the 
nucleotide sequence from position 196 to position 198 is not TGC; 

xxx) the nucleotide sequence from position 202 to position 204 is not CAA; 

xxxi) the nucleotide sequence from position 2 1 7 to position 2 1 9 is not AAT; 

xxxii) if the nucleotide sequence from position 220 to position 222 is AAA then the 
nucleotide sequence from position 223 to position 225 is not GCA; 

xxxiii) if the nucleotide sequence from position 223 to position 225 is GCA then the 
nucleotide sequence from position 226 to position 228 is not TAC; 

xxxiv) if the nucleotide sequence from position 253 to position 255 is GAC, then the 
nucleotide sequence from position 256 to position 258 is not CAA; 

xxxv) if the nucleotide sequence from position 277 to position 279 is CAT, then the 
nucleotide sequence from position 280 to position 282 is not AAA; 

xxxvi) the codons from the nucleotide sequence from position 298 to position 303 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTT A; 
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xxxvii) if the nucleotide sequence from position 304 to position 306 is GGC then the 
nucleotide sequence from position 307 to position 309 is not AAT; 

xxxviii) the codons from the nucleotide sequence from position 307 to position 
3 12 are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; 

xxxix) the codons from the nucleotide sequence from position 334 to position 342 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; 

xl) if the nucleotide sequence from position 340 to position 342 is AAG then the 

nucleotide sequence from position 343 to 345 is not CAT; 
xli)if the nucleotide position from position 346 to position 348 is CAA then the 

nucleotide sequence from position 349 to position 35 1 is not GCA; 
xlii) the codons from the nucleotide sequence from position 349 to position 357 

are chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
xliii) the nucleotide sequence from position 355 to position 357 is not AAT; 
xliv) if the nucleotide sequence from position 358 to position 360 is AAA then the 

nucleotide sequence from position 361 to 363 is not TTG; 
xlv) if the nucleotide sequence from position 364 to position 366 is GCC then the 

nucleotide sequence from position 367 to position 369 is not AAT; 
xlvi) the codons from the nucleotide sequence from position 367 to position 378 

are chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
xlvii) if the nucleotide sequence from position 382 to position 384 is AAT then the 

nucleotide sequence from position 385 to position 387 is not AAT; 
xlviii) the nucleotide sequen ce from position 385 to position 387 is not AAT; 
xlix) if the nucleotide sequence from position 400 to 402 is CCC, then the 

nucleotide sequence from position 403 to 405 is not AAT; 
1) if the nucleotide sequence from position 403 to 405 is AAT, then the nucleotide 

sequence from position 406 to 408 is not AAT; 
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li) the codons from the nucleotide sequence from position 406 to position 411 are 

chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
lii) the codons from the nucleotide sequence from position 421 to position 426 are 

chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
liii)the nucleotide sequence from position 430 to position 432 is not CCA; 
liv)if the nucleotide sequence from position 436 to position 438 is TCA then the 

nucleotide sequence from position 439 to position 441 is not TTG; 
lv) the nucleotide sequence from position 445 to position 447 is not TAT; 
lvi)the nucleotide sequence from position 481 to 483 is not AAT; 
lvii) if the nucleotide sequence from position 484 to position 486 is AAA, then the 

nucleotide sequence from position 487 to position 489 is not AAT simultaneously 

with the nucleotide sequence from position 490 to position 492 being AGY; 
lviii) if the nucleotide sequence from position 490 to position 492 is TCA, then the 

nucleotide sequence from position 493 to position 495 is not ACC simultaneously 

with the nucleotide sequence from position 496 to 498 being AAY; 
lix)if the nucleotide sequence from position 493 to position 495 is ACC, then the 

nucleotide sequence from position 496 to 498 is not AAT; 
ix) the nucleotide sequence from position 496 to position 498 is not AAT; 
lxi)if the nucleotide sequence from position 499 to position 501 is AAA then the 

nucleotide sequence from position 502 to position 504 is not TCA or AGC; 
lxii) if the nucleotide sequence from position 508 to position 510 is GTA, then the 

nucleotide sequence from position 51 1 to 513 is not TTA; 
lxiii) if the nucleotide sequence from position 514 to position 516 is AAT then the 

nucleotide sequence from position 517 to position 519 is not ACA; 
Ixiv) if the nucleotide sequence from position 517 to position 519 is ACC or ACG, 

then the nucleotide sequence from position 520 to position 522 is not CAA 

simultaneously with the nucleotide sequence from position 523 to position 525 

being TCN; 
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lxv) the codons from the nucleotide sequence from position 523 to position 531 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; 
lxvi) if the nucleotide sequence from position 544 to position 546 is GAA then the 
nucleotide sequence from position 547 to position 549 is not TAT, 
simultaneously with the nucleotide sequence from position 550 to position 552 
being TTR; 

lxvii) the codons from the nucleotide sequence from position 547 to position 552 
are chosen according to the choices provided in such a way that the resulting 
nucleotide sequence does not comprise ATTTA; 
lxviii) if the nucleotide sequence from position 559 to positon 561 is GGA then the 
nucleotide sequence from position 562 to position 564 is not TTG simultaneously 
with the nucleotide sequence from position 565 to 567 being CGN; 
lxix) if the nucleotide sequence from position 565 to position 567 is CGC then the 

nucleotide sequence from position 568 to position 570 is not AAT; 
ixx) the nucleotide sequence from position 568 to position 570 is not AAT; 
Ixxi) if the nucleotide sequence from position 574 to position 576 is TTC then the 
nucleotide sequence from position 577 to position 579 is not CAA simultaneously 
with the nucleotide sequence from position 580 to position 582 being TTR; 
Ixxii) if the nucleotide sequence from position 577 to position 579 is CAA then the 

nucleotide sequence from position 580 to position 582 is not TTA; 
lxxiii) if the nucleotide sequence from position 583 to position 585 is AAT the the 

nucleotide sequence from position 586 to 588 is not TGC; 
Ixxiv) the nucleotide sequence from position 595 to position 597 is not AAA; 
lxxv) if the nucleotide sequence from position 598 to position 600 is ATT then the 

nucleotide sequence from position 601 to position 603 is not AAT; 
lxxvi) the nucleotide sequence from position 598 to position 600 is not ATA; 
lxxvii) the nucleotide sequecne from position 601 to position 603 is not AAT; 
lxxviii)if the nucleotide sequence from position 604 to position 606 is AAA then the 

nucleotide sequence from position 607 to position 609 is not AAT; 
Ixxix) the nucleotide sequence from position 607 to position 609 is not AAT; 
lxxx) the nucleotide sequence from position 613 to position 615 is not CCA; 
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lxxxi) if the nucleotide sequence from position 613 to position 615 is CCG, then the 

nucleotide sequence from position 616 to position 618 is not ATA; 
lxxxii) if the nucleotide sequence from position 616 to the nucleotide at position 618 

is ATA, then the nucleotide sequence from position 619 to 621 is not ATA; 
lxxxiii)if the nucleotide sequence from position 619 to position 621 is ATA, then the 

nucleotide sequence from position 622 to position 624 is not TAC; 
lxxxi v) the nucleotide sequence from position 619 to position 621 is not ATT; 
Ixxxv) the codons from the nucleotide sequence from position 640 to position 645 

are chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
lxxxvi)if the nucleotide sequence from position 643 to position 645 is TTA then the 

nucleotide sequence from position 646 to position 648 is not ATA; 
lxxxvii) if the nucleotide sequence from position 643 to position 645 is CTA 

then the nucleotide sequence from position 646 to position 648 is not ATA; 
lxxxviii) the codons from the nucleotide sequence from position 655 to position 

660 are chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
lxxxix)if the nucleotide sequence from position 658 to 660 is TTA or CTA then the 

nucleotide sequence from position 661 to position 663 is not ATT or ATC; 
xc) the nucleotide sequence from position 661 to position 663 is not ATA; 
xci) if the nucleotide sequence from position 661 to position 663 is ATT then the 

nucleotide sequence from position 664 to position 666 is not AAA; 
xcii) the codons from the nucleotide sequence from position 670 to position 675 

are chosen according to the choices provided in such a way that the resulting 

nucleotide sequence does not comprise ATTTA; 
xciii) if the nucleotide sequence from position 691 to position 693 is TAT then the 

nuclotide sequence from position 694 to position 696 is not AAA; 
xciv) if the nucleotide sequence from position 694 to position 696 is AAA then the 

nucleotide sequence from position 697 to position 699 is not TTG; 
xcv) if the nucleotide sequence from position 700 to position 702 is CCC then the 

nucleotide sequence from position 703 to position 705 is not AAT; 
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xcvi) if the nucleotide sequence from position 703 to position 705 is AAT then the 
nucleotide sequence from position 706 to position 708 is not ACA or ACT; 

xcvii) if the nucleotide sequence from position 706 to position 708 is ACA then the 
nucleotide sequence from position 709 to 71 1 is not ATA simultaneously with the 
nucleotide sequence from position 7 1 2 to position 714 being AGY; 

xcviii) said nucleotide sequence does not comprise the codons TTA, CTA, ATA, 
GTA, TCG, CCG, ACG and GCG; 

xcix) said nucleotide sequence does not comprise a GC stretch consisting of 7 
consecutive nucleotides selected from the group of G or C; and 

c) said nucleotide sequence does not comprise a AT stretch consisting of 5 
consecutive nucleotides selected from the group of A or T. 

26. An isolated DNA fragment comprising the nucleotide sequence of SEQ ID No 3 wherein 
the GC content of said nucleotide sequence is about 50 to about 60%, provided that 

a) if the nucleotide sequence from position 121 to position 123 is GAG then the 
nucleotide sequence from position 124 to 126 is not CAA; 

b) if the nucleotide sequence from position 253 to position 255 is GAC then the 
nucleotide sequence from position 256 to 258 is not CAA; 

c) if the nucleotide sequence from position 277 to position 279 is CAT then the 
nucleotide sequence from position 280 to 282 is not AAA; 

d) if the nucleotide sequence from position 340 to position 342 is AAG then the 
nucleotide sequence from position 343 to position 345 is not CAT; 

e) if the nucleotide sequence from position 490 to position 492 is TCA then the 
nucleotide sequence from position 493 to position 495 is not ACC; 

f) if the nucleotide sequence from position 499 to position 501 is AAA then the 
nucleotide sequence from position 502 to 504 is not TCA or AGC; 

g) if the nucleotide sequence from position 517 to position 519 is ACC then the 
nucleotide sequence from position 520 to position 522 is not CAA simultaneous with 
the nucleotide sequence from position 523 to 525 being TCN; 

h) if the nucleotide sequence from position 661 to position 663 is ATT then the 
nucleotide sequence from position 664 to position 666 is not AAA; 
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i) the codons from the nucleotide sequence from position 7 to position 15 are chosen 
according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
ofGorC; 

j) the codons from the nucleotide sequence from position 61 to position 69 are chosen 
according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
ofGorC; 

k) the codons from the nucleotide sequence from position 130 to position 138 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
of G or C; 

1) the codons from the nucleotide sequence from position 268 to position 279 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
ofGorC; 

m) the codons from the nucleotide sequence from position 322 to position 333 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
ofGorC; 

n) the codons from the nucleotide sequence from position 460 to position 468 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of seven contiguous nucleotides from the group 
of Gor C; 

o) the codons from the nucleotide sequence from position 13 to position 27 are chosen 
according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

p) the codons from the nucleotide sequence from position 37 to position 48 are chosen 
according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 
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q) the codons from the nucleotide sequence from position 184 to position 192 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

r) the codons from the nucleotide sequence from position 214 to position 219 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

s) the codons from the nucleotide sequence from position 277 to position 285 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

t) the codons from the nucleotide sequence from position 388 to position 396 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

u) the codons from the nucleotide sequence from position 466 to position 474 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

v) the codons from the nucleotide sequence from position 484 to position 489 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

w) the codons from the nucleotide sequence from position 571 to position 576 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

x) the codons from the nucleotide sequence from position 598 to position 603 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 
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y) the codons from the nucleotide sequence from position 604 to position 609 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

z) the codons from the nucleotide sequence from position 613 to position 621 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

aa)the codons from the nucleotide sequence from position 646 to position 651 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; 

bb)the codons from the nucleotide sequence from position 661 to position 666 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T; and 

cc) the codons from the nucleotide sequence from position 706 to position 714 are 
chosen according to the choices provided in such a way that the resulting nucleotide 
sequence does not comprise a stretch of five contiguous nucleotides from the group 
of A or T. 

27. An isolated DNA sequence according to claim 26, characterized in that it contains a 
nucleotide sequence differing from the nucleotide sequence of SEQ ID No 4 in only one 
position. 

28. An isolated DNA sequence according to claim 26, characterized in that it contains a 
nucleotide sequence differing from the nucleotide sequence of SEQ ID No 4 in only ten 
positions. 

29. An isolated DNA sequence comprising the nucleotide sequence of SEQ ID No 4. 
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30. A chimeric gene comprising the isolated DNA fragment according to any one of claims 
24 to 29 operably linked to a plant-expressible promoter. 

31. Use of a chimeric gene according to claim 30 to insert a foreign DNA into an I-Scel 
recognition site in the genome of a plant. 
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SEQUENCE LISTING 

<110> Bayer Bioscience N.V. 
D'Halluin, Kathleen 
Vanderstraeten, Chantal 
Ruiter, Rene 

<120> Improved targeted DNA insertion in plants 

<130> BCS 03-2007 WOl 

<150> EP 03078700.6 
<151> 2003-11-18 

<160> 7 

<170> Patentln version 3.0 

<210> 1 

<211> 244 

<212> PRT 

<213> Saccharomyces cerevisiae 

<400> 1 

Met Ala Lys Pro Pro Lys Lys Lys Arg Lys Val Asn lie Lys Lys Asn 
1.5 10 15 

Gin Val Met Asn Leu Gly Pro Asn Ser Lys Leu Leu Lys Glu Tyr Lys 
20 25 30 

Ser Gin Leu lie Glu Leu Asn lie Glu Gin Phe Glu Ala Gly lie Gly 
35 40 45 

Leu lie Leu Gly Asp Ala Tyr lie Arg Ser Arg Asp Glu Gly Lys Thr 
50 55 60 

Tyr Cys Met Gin Phe Glu Trp Lys Asn Lys Ala Tyr Met Asp His Val 
65 70 75 80 

Cys Leu Leu Tyr Asp Gin Trp Val Leu Ser Pro Pro His Lys Lys Glu 
85 90 95 

Arg Val Asn His Leu Gly Asn Leu Val lie Thr Trp Gly Ala Gin Thr 
100 105 110 

Phe Lys His Gin Ala Phe Asn Lys Leu Ala Asn Leu Phe lie Val Asn 
115 120 125 

Asn Lys Lys Thr lie Pro Asn Asn Leu Val Glu Asn Tyr Leu Thr Pro 
130 135 140 

Met Ser Leu Ala Tyr Trp Phe Met Asp Asp Gly Gly Lys Trp Asp Tyr 
145 150 155 160 

Asn Lys Asn Ser Thr Asn Lys Ser lie Val Leu Asn Thr Gin Ser Phe 
165 170 175 
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Thr Phe Glu Glu Val Glu Tyr Leu 
180 

Gin Leu Asn Cys Tyr Val Lys He 
195 200 

He Asp Ser Met Ser Tyr Leu He 
210 215 

Leu He Pro Gin Met Met Tyr Lys 
225 230 

Thr Phe Leu Lys 



Val Lys Gly Leu Arg Asn Lys Phe 
185 190 

Asn Lys Asn Lys Pro He He Tyr 
205 

Phe Tyr Asn Leu He Lys Pro Tyr 
220 

Leu Pro Asn Thr He Ser Ser Glu 

235 240 



<210> 2 

<211> 732 

<212> DNA 

<213> Artificial 

<220> 

<223> synthetic DNA sequence encoding I-Scel (UIPAC code) 
<220> 

<221> misc_feature 

<222> (6).. (6) 

<223> N=A,G,C or T 



<220> 

<221> variation 

<222> (25).. (27) 

<223> AGR 



<220> 

<221> variation 

<222> (61) . . (63) 

<223> TTR 



<220> 

<221> variation 

<222> (73) . . (75) 

<223> AGY 



<220> 

<221> variation 

<222> (79).. (81) 

<223> TTR 



<220> 

<221> variation 
<222> (82).. (84) 
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<223> TTR 



<220> 

<221> variation 

<222> (97).. (99) 

<223> AGY 



<220> 

<221> variation 

<222> (103) . . (105) 

<223> TTR 



<220> 

<221> variation 

<222> (112) . . (114) 

<223> TTR 



<220> 

<221> variation 

<222> (145) . . (147) 

<223> TTR 



<220> 

<221> variation 

<222> (151) . . (153) 

<223> TTR 



<220> 

<221> variation 

<222> (169).. (171) 

<223> AGR 



<220> 

<221> variation 

<222> (172) . . (174) 

<223> AGY 



<220> 

<221> variation 

<222> (175) . . (177) 

<223> AGR 



<220> 

<221> variation 

<222> (244) . . (246) 

<223> TTR 
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<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 



variation 
(247) . . (249) 
TTR 



variation 
(265) . . (267) 
TTR 



variation 
(268) . . (270) 
AGY 



variation 
(289) . . (291) 
AGR 



variation 
(301) . . (303) 
TTR 



variation 
(310) . . (312) 
TTR 



variation 
(361) . . (363) 
TTR 



variation 
(370) . . (372) 
TTR 



variation 
(409) . . (411) 
TTR 



variation 
(424) . . (426) 
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<223> TTR 



<220> 

<221> variation 

<222> (436) . . (438) 

<223> AGY ♦ 



<220> 

<221> variation 

<222> (439) . . (441) 

<223> TTR 



<220> 

<221> variation 

<222> (490) . . (492) 

<223> AGY 



<220> 

<221> variation 

<222> (502) . . (504) 

<223> AGY 



<220> 

<221> variation 

<222> (511) . . (513) 

<223> TTR 



<220> 

<221> variation 

<222> (523) . . (525) 

<223> AGY 



<220> 

<221> variation 

<222> (550) . . (552) 

<223> TTR 



<220> 

<221> variation 

<222> (562) . . (564) 

<223> TTR 



<220> 

<221> variation 

<222> (565) . . (567) 

<223> AGR 
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<220> 

<221> variation 

<222> (580) . . (582) 

<223> TTR 



<220> 

<221> variation 

<222> (631) . . (633) 

<223> AGY 



<220> 

<221> variation 

<222> (637) . . (639) 

<223> AGY 



<220> 

<221> variation 

<222> (643) . . (645) 

<223> TTR 



<220> 

<221> variation 

<222> (658) . . (660) 

<223> TTR 



<220> 

<221> variation 

<222> (673) . . (675) 

<223> TTR 



<220> 

<221> variation 

<222> (697) . . (699) 

<223> TTR 



<220> 

<221> variation 

<222> (712) . . (714) 

<223> AGY 



<220> 

<221> variation 

<222> (715) . . (717) 

<223> AGY 



<220> 

<221> variation 
<222> (727) . . (729) 
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<223> TTR 



<220> 

<221> misc^feature 

<222> (12).. (12) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (15).. (15) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (27).. (27) 

<223> N = A, G, C or 



<220> 

<221> misc_feature 

<222> (33).. (33) 

<223> N = A, G, C or 



<220> 

<221> misc_feature 

<222> (54) . . (54) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (63).. (63) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (66) . . (66) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (69).. (69) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (75).. (75) 

<223> N = A, G, C or T 
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<220> 

<221> misc_feature 

<222> (81).. (81) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (84).. (84) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (99).. (99) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (105) . . (105) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (114) . . (114) 

<223> N = A, G, C or T 



<220> 

<221> misc__f eature 

<222> (135) . . (135) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (138) . . (138) 

<223> N - A, G, C or T 



<220> 

<221> misc_f eature 

<222> (144) . . (144) 

<223> N = A/ G, C or T 



<220> 

<221> mi sc_f eature 

<222> (147) . . (147) 

<223> N — A r G, C or T 



<220> 

<221> misc_f eature 
<222> (153) . . (153) 



Page 8 of 33 



WO 2005/049842 



PCT/EP2004/013122 



<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (156) . . (156) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (162) . . (162) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (171) . . (171) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (174) . . (174) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (177) . . (177) 

<223> N — A, G, C or T 



<220> 

<221> misc_f eature 

<222> (186) . . (186) 

<223> N = A r G, C or T 



<220> 

<221> misc_f eature 

<222> (192) . . (192) 

<223> N = A, G, C or T 



<220> 

<221> misc^feature 

<222> (225) . . (225) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (240) . . (240) 

<223> N — A, G, C or T 
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<220> 

<221> misc_f eature 

<222> (246) . . (246) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (249) . . (249) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (264) . . (264) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (267) . . (267) 

<223> N — A, G f C or T 



<220> 

<221> misc_feature 

<222> (270) . . (270) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (273) . - (273) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (276) . . (276) 

<223> N = A, G f C or T 



<220> 

<221> misc_f eature 

<222> (291) . . (291) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (294) . . (294) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 
<222> (303) . . (303) 
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<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (306) . . (306) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (312) . . (312) 

<223> N = A f G, C or T 



<220> 

<221> misc_feature 

<222> (315) . . (315) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (321) . . (321) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (327) . . (327) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (330) . . (330) 

<223> N = A, G, C or T 



<220> 

<221> mi sc__f eature 

<222> (336) . . (336) 

<223> N = A/ G, C or T 



<220> 

<221> mi sc_f eature 

<222> (351) . . (351) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (363) . . (363) 

<223> N = A, G, C or T 
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<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 
<223> 



<220> 
<221> 
<222> 



misc_f eature 
(366) . . (366) 
N = A, G, C or T 



mi sc_f eature 
(372) . . (372) 
N — A, G, C or T 



mi sc__f eature 
(381) . . (381) 
N = A, G, C or T 



misc_f eature 
(396) . . (396) 
N = A, G, C or T 



misc_f eature 
(402) . . (402) 
N = A, G, C or T 



misc__f eature 
(411) . . (411) 
N « A, G, C or T 



mi sc_f eature 
(414) . . (414) 
N = A, G, C or T 



mi sc__f eature 
(426) . . (426) 
N = A, G, C or T 



misc_feature 
(429) . . (429) 
N = A f G, C or T 



misc_f eature 
(432) . . (432) 
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<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (438) . • (438) 

<223> N = A, G, C or T 



<220> 

<221> misc_ feature 

<222> (441) . . (441) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (444) . . (444) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (465) . . (465) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (468) . . (468) 

<223> N = A f G, C or T 



<220> 

<221> mi sc__f eature 

<222> (492) . . (492) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (495) . . (495) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (504) . . (504) 

<223> N = A, G f C or T 



<220> 

<221> misc_feature 

<222> (510) . . (510) 

<223> N = A, G, C or T 
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<220> 

<221> misc__feature 

<222> (513) . . (513) 

<223> N - A, G, C or T 



<220> 

<221> m±sc_f eature 

<222> (519) . . (519) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (525) . . (525) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (531) . . (531) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (543) . . (543) 

<223> N = A, G, C or T 



<220> 

<221> mi sc_f eature 

<222> (552) . . (552) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (552) . . (552) 

<223> N « A/ G, C or T 



<220> 

<221> misc_feature 

<222> (555) . . (555) 

<223> N = A/ G, C or T 



<220> 

<221> mi sc_f eature 

<222> (561) . . (561) 

<223> N = A f Gr C or T 



<220> 

<221> misc_feature 
<222> (564) . . (564) 
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<223> N « A, G, C or T 



<220> 

<221> raisc_f eature 

<222> (567) . . (567) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (582) . . (582) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (594) . . (594) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (615) . . (615) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (633) . . (633) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (639) . . (639) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (645) . . (645) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (660) . . (660) 

<223> N - A, G, C or T 



<220> 

<221> misc_f eature 

<222> (669) , . (669) 

<223> N = A, G, C or T 
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<220> 

<221> misc_feature 

<222> (675) . . (675) 

<223> N == A/ G, C or T 



<220> 

<221> misc_feature 

<222> (681) . . (681) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (699) . . (699) 

<223> N = A, G, C or T 



<220> 

<221> misc_feature 

<222> (702) . . (702) 

<223> N = A, G, C or T 



<220> 

<221> mis cofeature 

<222> (708) . . (708) 

<223> N « A, G, C or T 



<220> 

<221> misc_feature 

<222> (714) . . (714) 

<223> N - A, G, C or T 



<220> 

<221> inisc_f eature 

<222> (717) . . (717) 

<223> N - A, G, C or T 



<220> 

<221> misc_f eature 

<222> (723) . . (723) 

<223> N = A, G, C or T 



<220> 

<221> misc_f eature 

<222> (729) . . (729) 

<223> N - A, G f C or T 

<400> 2 

atggcnaarc cnccnaaraa raarcgnaar gtnaayatha araaraayca rgtnatgaay 60 
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ctnggnccna 


aytcnaarct 


nctnaargar 


tayaartcnc 


arctnathga 


rctnaayath 


120 


garcarttyg 


argcnggnat 


hggnctnath 


ctnggngayg 


cntayathcg 


ntcncgngay 


180 


garggnaara 


cntaytgyat 


gcarttygar 


tggaaraaya 


argcntayat 


ggaycaygtn 


240 


tgyctnctnt 


aygaycartg 


ggtnctntcn 


ccnccncaya 


araargarcg 


ngtnaaycay 


300 


ctnggnaayc 


tngtnathac 


ntggggngcn 


caracnttya 


arcaycargc 


nttyaayaar 


360 


ctngcnaayc 


tnttyathgt 


naayaayaar 


aaracnathc 


cnaayaayct 


ngtngaraay 


420 


tayctnacnc 


cnatgtcnct 


ngcntaytgg 


ttyatggayg 


ayggnggnaa 


rtgggaytay 


480 


aayaaraayt 


cnacnaayaa 


rtcnathgtn 


ctnaayacnc 


artcnttyac 


nttygargar 


540 


gtngartayc 


tngtnaargg 


nctncgnaay 


aarttycarc 


tnaaytgyta 


ygtnaarath 


600 


aayaaraaya 


arccnathat 


htayathgay 


tcnatgtcnt 


ayctnathtt 


ytayaayctn 


660 


athaarccnt 


ayctnathcc 


ncaratgatg 


tayaarctnc 


cnaayacnat 


htcntcngar 


720 


acnttyctna 


ar 










732 



<210> 3 

<211> 732 

<212> DNA 

<213> Artificial 

<220> 

<223> preferred synthetic DNA sequence encoding I-Scel (UIPAC code) 
<220> 

<221> variation 

<222> (25) . . (27) 

<223> AGA 



<220> 

<221> variation 

<222> (73) . . (75) 

<223> AGC 



<220> 

<221> variation 

<222> (97).. (99) 

<223> AGC 



<220> 

<221> variation 

<222> (169) . . (171) 

<223> AGA 
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<220> 

<221> variation 

<222> (172) . . (174) 

<223> AGC 



<220> 

<221> variation 

<222> (175) . . (177) 

<223> AGA 



<220> 

<221> variation 

<222> (268) . . (270) 

<223> AGC 



<220> 

<221> variation 

<222> (289) . . (291) 

<223> AGA 



<220> 

<221> variation 

<222> (436) . . (438) 

<223> AGC 



<220> 

<221> variation 

<222> (490) . . (492) 

<223> AGC 



<220> 

<221> variation 

<222> (502) . . (504) 

<223> AGC 



<220> 

<221> variation 

<222> (523) • . (525) 

<223> AGC 



<220> 

<221> variation 

<222> (565) . . (567) 

<223> AGA 



<220> 

<221> variation 
<222> (631) . . (633) 
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<223> AGC 



<220> 

<221> variation 

<222> (637) . . (639) 

<223> AGC, 



<220> 

<221> variation 

<222> (712) . . (714) 

<223> AGC 



<220> 

<221> variation 

<222> (715) . . (717) 

<223> AGC 



<400> 3 
atggcyaarc 


chcchaaraa raarcgsaaa gtsaacatya 


araaraacca 


ggtsatgaac 


60 


ctsggmccha 


actcmaarct sctsaargag tacaartcmc 


arctsatyga 


rctsaacaty 


120 


garcarttcg 


argcyggmat cggmctsaty ctsggmgayg 


cytacatycg 


stcmcgsgay 


180 


garggmaara 


cytactgyat gcagttcgar tggaaraaca 


argcytacat 


ggaycaygts 


240 


tgyctsctst 


acgaycartg ggtsctstcm cchcchcaya 


araargarcg 


sgtsaaccay 


300 


ctsggmaacc 


tsgtsatyac ytggggmgcy caracyttca 


arcaycargc 


yttcaacaar 


360 


ctsgcsaacc 


tsttcatyct saacaacaar aaracyatyc 


chaacaacct 


sgtsgaraac 


420 


tacctsacyc 


cyatgtcmct, sgcytactgg. ttcatggayg 


ayggroggmaa 


rtgggaytac 


480 


aacaaraact 


cmacyaacaa rtcmatygts ctsaacacyc 


artcmttcac 


yttcgargar 


540 


gtsgartacc 


tsgtsaargg mctscgsaac aarttccarc 


tsaactgyta cgtsaagaty 


600 


aacaaraaca 


arccyatyat ctacatygay tcmatgtcmt 


acctsatytt 


ctacaaccts 


660 


atyaarccht 


acctsatycc hcaratgatg tacaarctsc 


chaacacyat 


ytcmtcmgar 


720 


acyttcctsa 


ar 






732 



<210> 4 

<211> 732 

<212> DNA 

<213> Artificial 

<220> 

<223> preferred synthetic DNA sequence encoding I-Scel (UIPAC code) 
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<400> 4 
atggccaagc 


ctcccaagaa 


gaagcgcaaa 


gtgaacatca 


agaagaacca 


y y i_ y a Lyaau 


60 


ctgggaccta 


acagcaagct 


cctgaaggag 


tacaagagcc 


agctgatcga 




120 


gagcagttcg 


aagctggcat 


cggcctgatc 


ctgggcgatg 


cctiacai-cag 




180 


gaaggcaaga 


cctactgcat 


gcagttcgag 


tggaagaaca 


aggcctacat 




940 


tgtctgctgt 


acgaccagtg 


ggtcctgagc 


cctcctcaca 


agaaggagcg 


cy LyaaULaL 


300 


ctgggcaacc 


tcgtgatcac 


ctggggagcc 


cagaccttca 


agcaccaggc 






ctggccaacc 


tgttcatcgt 


gaacaacaag 


aagaccatcc 


ccaacaacct 


cy LyyaydaL. 


420 


tacctcactc 


ccatgagcct 


ggcctactgg 


ttcatggacg 


acgyaggcaa 




480 


aacaagaaca 


gcaccaacaa 


gtcaattgtg 


ctgaacaccc 


aaagcttcac 


cttcgaagaa 


540 


gtggagtacc 


tcgtcaaggg 


cctgcgcaac 


aagttccagc 


tgaactgcta 


cgtgaagatc 


600 


aacaagaaca 


agcctatcat 


ctacatcgac 


agcatgagct 


acctgatctt 


ctacaacctg 


660 


atcaagccat 


acctgatccc 


tcagatgatg 


tacaagctgc 


ccaacaccat 


cagcagcgag 


720 


accttcctga 


ag 










732 



<210> 5 

<211> 3262 

<212> DNA 

<213> Artificial 

<220> 

<223> T-DNA of pTTAM7 8 (target locus) 
<220> 

<221> misc_f eature 

<222> (1)..(25) 

<223> Right T-DNA border sequence 



<220> 

<221> misc_f eature 

<222> (26).. (72) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_f eature 

<222> (73).. (333) 

<223> 3 f nos (complement) 



<220> 

<221> misc__f eature 
<222> (334) . . (351) 
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<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (352) . . (903) 

<223> bar sequence (complement) 



<220> 

<221> misc_feature 

<222> (904) . . (928) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (929) . . (946) 

<223> I-Scel recognition site 



<220> 

<221> misc_feature 

<222> (947) . . (967) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (968) . . (1171) 

<223> 3'g7 



<220> 

<221> misc_feature 

<222> (1172) . . (1290) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (1291) . . (1577) 

<223> promoter nopaline synthetase gene 



<220> 

<221> misc_feature 

<222> (1578) . . (1590) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (1591) . . (2394) 

<223> nptll 
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<220> 

<221> misc_feature 

<222> (2395) . . (2567) 

<223> 3 f neo 



<220> 

<221> misc_feature 

<222> (2568) . . (3183) 

<223> 3' ocs 



<220> 

<221> misc_feature 

<222> (3184) . . (3234) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (3235) . . (3262) 

<223> left T-DNA border sequence 



aa!tacaacg gtatatatcc tgccagtact cggccgtcga cctgcaggca attggtacct 60 
agaggatctt cccgatctag taacatagat gacaccgcgc gcgataattt atcctagttt 120 
gcgcgctata ttttgttttc tatcgcgtat taaatgtata attgcgggac tctaatcata 
aaaacccatc tcataaataa cgtcatgcat tacatgttaa ttattacatg cttaacgtaa 
ttcaacagaa attatatgat aatcatcgca agaccggcaa caggattcaa tcttaagaaa 
ctttattgcc aaatgtttga acgatctgct tcggatccta gacgcgtgag atcagatctc 
ggtgacgggc aggaccggac ggggcggtac cggcaggctg aagtccagct gccagaaacc 
cacgtcatgc cagttcccgt gcttgaagcc ggccgcccgc agcatgccgc ggggggcata 
tccgagcgcc tcgtgcatgc gcacgctcgg gtcgttgggc agcccgatga cagcgaccac 
gctcttgaag ccctgtgcct ccagggactt cagcaggtgg gtgtagagcg tggagcccag 
tcccgtccgc tggtggcggg gggagacgta cacggtcgac tcggccgtcc agtcgtaggc 
gttgcgtgcc ttccaggggc ccgcgtaggc gatgccggcg acctcgccgt ccacctcggc 
gacgagccag ggatagcgct cccgcagacg gacgaggtcg tccgtccact cctgcggttc 
ctgcggctcg gtacggaagt tgaccgtgct tgtctcgatg tagtggttga cgatggtgca 
gaccgccggc atgtccgcct cggtggcacg gcggatgtcg gccgggcgtc gttctgggtc 
catggttata gagagagaga tagatttaat taccctgtta tccctaggcc gctgtacagg 



180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
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ttaaaaaaaa 




atatttatta 


ataaaataac 


aagtcaggta 


1020 


ttatagtcca 


agcaaaaaca 


taaatttatt 


gatgcaagtt 


taaattcaga 


aatatttcaa 


1080 


taactgatta 


.tatcagctgg 


tacattgccg 


tagatgaaag 


actgagtgcg 


atattatgtg 


1140 


taatacataa 


attgatgata 


tagctagctt 


aggcgcgcca 


tagatcccgt 


caattctcac 


1200 


tcattaggca 


ccccaggctt 


tacactttat 


gcttccggct 


cgtataatgt 


gtggaattgt 


1260 


gagcggataa caatttcaca caggaaacag gatcatgagc ggagaattaa gggagtcacg 


1320 


ttatgacccc 


cgccgatgac 


gcgggacaag ccgttttacg tttggaactg acagaaccgc 


1380 


aacgattgaa 


ggagccactc 


agccgcgggt 


ttctggagtt 


taatgagcta 


agcacatacg 


1440 


tcagaaacca 


ttattgcgcg 


ttcaaaagtc 


gcctaaggtc 


actatcagct 


agcaaatatt 


1500 


tcttgtcaaa 


aatgctccac 


tgacgttcca 


taaattcccc 


tcggtatcca 


attagagtct 


1560 


catattcact 


ctcaatcaaa 


gatccggccc 


atgatcatgt 


ggattgaaca 


agatggattg 


1620 


cacgcaggtt 


ctccggccgc 


ttgggtggag 


aggctattcg 


gctatgactg 


ggcacaacag 


1680 


acaatcggct 


gctctgatgc 


cgccgtgttc 


cggctgtcag 


cgcaggggcg 


cccggttctt 


1740 


tttgtcaaga 


ccgacctgtc 


cggtgccctg 


aatgaactgc 


aggacgaggc 


agcgcggcta 


1800 


tcgtggctgg 


ccacgacggg 


cgttccttgc 


gcagctgtgc 


tcgacgttgt 


cactgaagcg 


1860 


ggaagggact 


ggctgctatt 


gggcgaagtg 


ccggggcagg 


atctcctgtc 


atctcacctt 


1920 


gctcctgccg 


agaaagtatc 


catcatggct 


gatgcaatgc 


ggcggctgca 


tacgcttgat 


1980 


ccggctacct 


gcccattcga 


ccaccaagcg 


aaacatcgca 


tcgagcgagc 


acgtactcgg 


2040 


atggaagccg 


gtcttgtcga 


tcaggatgat 


ctggacgaag agcatcaggg gctcgcgcca 


2100 


gccgaactgt 


tcgccaggct 


caaggcgcgc 


atgcccgacg 


gcgaggatct 


cgtcgtgacc 


2160 


catggcgatg 


cctgcttgcc 


gaatatcatg 


gtggaaaatg 


gccgcttttc 


tggattcatc 


2220 


gactgtggcc 


ggctgggtgt 


ggcggaccgc 


tatcaggaca 


tagcgttggc 


tacccgtgat 


2280 


attgctgaag 


agcttggcgg 


cgaatgggct 


gaccgcttcc 


tcgtgcttta cggtatcgcc 


2340 


gctcccgatt 


cgcagcgcat 


cgccttctat 


cgccttcttg 


acgagttctt 


cfgagcggga 


2400 


ctctggggtt 


cgaaatgacc 


gaccaagcga 


cgcccaacct 


gccatcacga 


gatttcgatt 


2460 


ccaccgccgc 


cttctatgaa 


aggttgggct 


tcggaatcgt 


tttccgggac 


gccggctgga 


2520 


tgatcctcca 


gcgcggggat 


ctcatgctgg 


agttcttcgc 


ccaccccctg* ctttaatgag 


2580 


atatgcgaga 


cgcctatgat 


cgcatgatat 


ttgctttcaa 


ttctgttgtg 


cacgttgtaa 


2640 


aaaacctgag 


catgtgtagc 


tcagatcctt 


accgccggtt 


tcggttcatt 


ctaatgaata 


2700 



Page 23 of 33 



WO 2005/049842 



PCT/EP2004/013122 



L. ct <_ v_cn_^« oy v_ 




C L. L- L (_Ct LUaa 


taatattctc cgttcaattt 


actgattgta 


2760 


CCuLdCLaCL 


4- -a 4- 4- /-f 4- -5 « r> 


aLat Loaaa L. 


gaaaacaata 


tattgtgctg 


aataggttta 


2820 


■4— — > j-% i-r ~\ y^ 4- /■*» 


4- a+-rfa4-anarr 
(-cLL.gclL.ciy dy 




acaaacaatt 


gcgttttatt 


attacaaatc 


2880 


v^clct L. UL tdoa 


ctdcicxyoyyv^cL 


fraarcocrtca 


dai — w Luaaay 


artaattaca 


taaatcttat 


2940 


tcaaatttca 


aaaggcccca 


ggggctagta 


tctacgacac 


accgagcggc 


gaactaataa 


0 A r\ A 

3000 


cgttcactga 


agggaactcc 


ggttccccgc 


cggcgcgcat 


gggtgagatt 


ccttgaagtt 


3060 


gagtattggc 


cgtccgctct 


accgaaagtt 


acgggcacca 


ttcaacccgg 


tccagcacgg 


3120 


eggcegggta 


accgacttgc 


tgccccgaga 


attatgeage 


atttttttgg 


tgtatgtggg 


3180 


ccctgtacag 


cggccgcgtt 


aacgegtata 


etctagageg 


ategecatgg 


agecatttae 


3240 


aattgaatat 


atcctgccgc 


eg 








3262 



<210> 6 

<211> 5345 

<212> DNA 

<213> Artificial 

<220> 

<223> T-DNA of pTTA82 (repair DNA) 
<220> 

<221> inisc_f eature 

<222> (1) . . (25) 

<223> right T-DNA border sequence 



<220> 

<221> misc_f eature 

<222> (26).. (62) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_f eature 

<222> (63).. (578) 

<223> bar 3' deleted (complement) 



<220> 

<221> misc_feature 

<222> (579) . . (603) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_f eature 
<222> (604) . . (616) 
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<223> partial I-Scel site 



<220> 

<221> misc_feature 

<222> (617) . . (1429) 

<223> P35S3 (complement) 



<220> 

<221> raisc_f eature 

<222> (1430) . . (1438) 

<223> partial I-Scel site 



<220> 

<221> misc_f eature 

<222> (1460) . . (1663) 

<223> 3' gene 7 



<220> 

<221> misc_feature 

<222> (1664) . . (1782) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_feature 

<222> (1783) . . (2069) 

<223> promoter of the 



nopaline synthetase gene 



<220> 

<221> misc_feature 

<222> (2070) . . (2082) 

<223> synthetic polylinker sequence 



<220> 

<221> mi sc_f eature 

<222> (2083) • . (2886) 

<223> nptll 



<220> 

<221> misc_feature 

<222> (2887) . . (3059) 

<223> 3 r neo 



<220> 

<221> misc_f eature 

<222> (3060) . . (3675) 

<223> 3' ocs 
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<220> 
<221> 
<222> 
<223> 



misc feature 



(3676) . . (3731) 

synthetic polylinker sequence 



<220> 

<221> misc_f eature 

<222> (3732) . . (4246) 

<223> P35S2 



<220> 

<221> misc_f eature 

<222> (4247) . . (4289) 

<223> AtslBL 



<220> 

<221> misc_f eature 

<222> (4290) . . (4322) 

<223> NLS 



<220> 

<221> misc_f eature 

<222> (4323) . . (5023) 

<223> I-Scel defective 



<220> 

<221> misc_feature 

<222> (5024) . . (5260) 

<223> 3' 35S 



<220> 

<221> misc_f eature 

<222> (5261) . . (5317) 

<223> synthetic polylinker sequence 



<220> 

<221> misc_f eature 

<222> (5318) . . (5345) 

<223> left T-DNA border sequence 



<400> 6 

aattacaacg gtatatatcc tgccagtact cggccgtcga cctgcaggca attggtacga 

tcctagacgc gtgagatcag atcctgccag aaacccacgt catgccagtt cccgtgcttg 

aagccggccg cccgcagcat gccgcggggg gcatatccga gcgcctcgtg catgcgcacg 

ctcgggtcgt tgggcagccc gatgacagcg accacgctct tgaagccctg tgcctccagg 
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gacttcagca ggtgggtgta gagcgtggag cccagtcccg 


tccgctggtg 


gcggggggag 


300 


acgtacacgg tcgactcggc 


cgtccagtcg 


taggcgttgc 


gtgccttcca 


ggggcccgcg 


360 


taggcgatgc cggcgacctc gccgtccacc tcggcgacga 


gccagggata 


gcgctcccgc 


420 


agacggacga ggtcgtccgt 


ccactcctgc ggttcctgcg gctcggtacg gaagttgacc 


480 


gtgcttgtct cgatgtagtg 


gttgacgatg 


gtgcagaccg 


ccggcatgtc 


cgcctcggtg 


540 


gcacggcgga tgtcggccgg 


gcgtcgttct 


gggtccatgg 


ttatagagag 


agagatagat 


600 


ttaattaccc tgttattaga gagagactgg tgatttcagc gtgtcctctc 


caaatgaaat 


660 


gaacttcctt atatagagga 


agggtcttgc 


gaaggatagt 


gggattgtgc 


gtcatccctt 


720 


acgtcagtgg agatgtcaca 


tcaatccact 


tgctttgaag 


acgtggttgg 


aacgtcttct 


780 


ttttccacga tgctcctcgt 


gggtgggggt 


ccatctttgg 


gaccactgtc 


ggcagaggca 


840 


tcttgaatga tagcctttcc 


tttatcgcaa 


tgatggcatt 


tgtaggagcc 


accttccttt 


900 


tctactgtcc tttcgatgaa 


gtgacagata 


gctgggcaat 


ggaatccgag 


gaggtttccc 


960 


gaaattatcc tttgttgaaa 


agtctcaata 


gccctttggt 


cttctgagac 


tgtatctttg 


1020 


acatttttgg agtagaccag agtgtcgtgc tccaccatgt tgacgaagat 


tttcttcttg 


1080 


tcattgagtc gtaaaagact 


ctgtatgaac 


tgttcgccag 


tcttcacggc 


gagttctgtt 


1140 


agatcctcga tttgaatctt 


agactccatg 


catggcctta 


gattcagtag 


gaactacctt 


1200 


tttagagact ccaatctcta 


ttacttgcct 


tggtttatga 


agcaagcctt 


gaatcgtcca 


1260 


tactggaata gtacttctga 


tcttgagaaa 


tatgtctttc 


tctgtgttct 


tgatgcaatt 


1320 


agtcctgaat cttttgactg 


catctttaac 


cttcttggga 


aggtatttga 


tctcctggag 


1380 


attgttactc gggtagatcg 


tcttgatgag 


acctgctgcg 


taggaacgct 


tatccctagg 


1440 


ccgctgtaca gggcccggga 


tcttgaaaga 


aatatagttt 


aaatatttat 


tgataaaata 


1500 


acaagtcagg tattatagtc 


caagcaaaaa 


cataaattta 


ttgatgcaag 


tttaaattca 


1560 


gaaatatttc aataactgat 


tatatcagct 


ggtacattgc 


cgtagatgaa 


agactgagtg 


1620 


cgatattatg tgtaatacat 


aaattgatga 


tatagctagc 


ttaggcgcgc 


catagatccc 


1680 


gtcaattctc actcattagg 


caccccaggc 


tttacacttt 


atgcttccgg 


ctcgtataat 


1740 


gtgtggaatt gtgagcggat 


aacaatttca 


cacaggaaac 


aggatcatga 


gcggagaatt 


1800 


aagggagtca cgttatgacc cccgccgatg acgcgggaca 


agccgtttta 


cgtttggaac 


1860 


tgacagaacc gcaacgattg 


aaggagccac 


tcagccgcgg 


gtttctggag 


tttaatgagc 


1920 


taagcacata cgtcagaaac 


cattattgcg 


cgttcaaaag 


tcgcctaagg 


tcactatcag 


1980 
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\~ **y wbmw w va 


tttcttotca 


aaaatoctcc 


actaacattc 


cataaattcc 


cctcggtatc 


2040 


caattacracrt 


ct cat at tea 


ctctcaatca 


aagatcegge 


ccatgatcat 


qtqqattgaa 


2100 


ca a naf" p/pt a t" 
Lciciycii_yya *— 


tcrcaccrcacicr 


tt eteccrcrcc 


ac 1 1 acsct t cro 

yuLtyyy L yy 


agaggctatt 


eggctatgae 


2160 


Lyy y uouaov 


aoacaat" ccics 

C-* vj u c* u \_» v_*»* vj 


ctcretetcfat 


QCCOCCCltCft 


tccggctgtc 




2220 


rrrprprrrfM" c 
v^y ^uuy y i_ »_ w 


tttttatcaa 


cyaccoraccto 


tccaortcrccc 

N#* \A V4 W V** 


tgaatgaact 


acaacracaacr 


2280 




t a t c ci t a cr c t 

I— <— X L* y y w L. l_ 


cr rr c c a c a a c a 


acrccrttcctt 


gcgcagctgt 


getcgaegtt 


2340 


pr4~ ct~ rr a a pt 


ecrcrcr aaprcicra 

Lyyydayyyd 


ct crcrctcrct a 


ttcraacLTaacr 


tacccrcrciLTca 


ggatctcctg 


2400 


L L d L- L LuuL>^ 


4- f- rrpf p'p't crc 


Ly ciy uwuvj a 


tcca tcatc/cr 


ctgatgcaat 


CFCoaccract cr 

yi— -yywyywuy 


2460 


r» ;n 4- =3 rrrf 4- 4" pt 


^ "h P"Crf pt t~ a p* 
ci uuoyyu L dL 


ct cr cecal - 1" c 

L» L y LrV^V^GL 


aacraccaao 

y a. v» t* ^» • w <i» a y 


ccraaacatccr 


catcgagega 


2520 


fipap rr"f~ a p* +* p* 


yyo uyyaayu 


ccrcrt ct* 1" fit C 
Lyy uu l uy o l 


cratcacicjat cr 

vj a way y w. \— y 


atctggacga 


agagcatcag 


2580 


yyyutuyoyu 


rarrpprra a el" 


crt t ccrccacici 

y l LLyL^ayy 


ct c^aaaccsc 

\*r — . wddy y »y w 


crcatcf cccaa 


eggegaggat 


2640 


e+* prri" fTr4~ rr ^ 
LLLy LLy tya 


LLLci uyy uya 


4-fTCct rici* t" CT 
LyLLtywu i— y 


cccraatatca 


tcrcftcrcraaaa 


tggccgcttt 


2700 


LLLLjya L. LLd 


uv-yauty Lyy 


L-uyyL>Lyyy l 


cr t* o cr c cr cr a c c 
y LyyL>yywL»L 


crctatcacfLTa 


catagcgttg 


2760 


i^t /— +- =a p»p*<T , r4- n 
ytLa uy uy 


af- aff" prp»4- rra 


Ana crct tr ncrc 
oyoyLL Ly y l> 


rffTCCfaat crcrcr 

y y L»y a a. uy y y 


ctcracccrctt 


cctcgtgctt 


2820 


l auyy La. Loy 


p , p , pfe4-/— /■"•fprsa 

LLyL LLLLy a 


■f""t~cccAC/crfC 
l LLyLayLyL 


A t" cocct t ct 

a LL y LL • L LL« L 


a tccrcct tct 


t gacgagt t c 


2880 


l ll- Lycty y 


yaLLLLyyyy 


l LLy ddd l y d 


LLy aovoay v. » 


cracocccaac 


ctgccatcac 


2940 


yaya l l Ltya 


L LLLaLLyLL 


erect t ci" a t cr 

yLULv-LLuuy 


aaacrcfttcrcicf 

a a ay y w y y y 


cttcac/aatc 


gttttccggg 


3000 


at-y LL.yy l. uy 


y Ct L yd ILL LL 


LayL.yL.yyyy 


at ct cat crct 

d L L L» LCC L» y W w> 


cr era crt tct tc 

y y ay u. w ^— w 


gcccaccccc 


3060 


4- /■« /-"•+- +- +- a ^ 4" 

CyCL LLadLy 


dydLdUyoyd 


ydL.yL»L.LClLy 


d LLyLfl Lya l 


atttactttc 

dL LLyLLLLL 


aatt ctcrttcr 


3120 


LyLaLy Liy l 


dadaoaLL uy 


ay LaLy Ly l d 


fTpfpacia tec 
y l l Lay d ll l 


1 1 a cccrcccr cr 

L LdLLyLLyy 


tttccfcrtt ca 


3180 


4- 4- /—* 4— 2j 5j 4- a 
L LCLdaLydd 


LdLdLCdLLL 


y L L dL Ld LLy 


4* a 4- 4- 4- 4- 4- a+* fr 
LC ILL LL Ld Ly 


piAtaatatt"C 

d d Ldd Ld L LL 


tcccrttcaat 


3240 


4- 4- — > /~« 4- it -a 4- 4- /t 


4- o /■■« /"< /™»4* aofa 

laCCCLaCLa 


/^4*4"_i4-_i4"rT4"a 
L. L Ld LaLy Ld 


Ldd Ld L L d dd 


d LyddddLdd 


t a t a 1 1 crt crc 

v^- a _.a u Ly w 


3300 


tgaataggtt 


tatagegaca 


tctatgatag 


agcgccacaa 


taacaaacaa 


ttgcgtttta 


3360 


ttattacaaa 


tccaatttta 


aaaaaagegg 


cagaaceggt 


caaacctaaa 


agactgatta 


3420 


cataaatctt 


attcaaattt 


caaaaggccc 


caggggctag 


tatctacgac 


acaccgagcg 


3480 


gcgaactaat 


aacgttcact 


gaagggaact 


ccggttGccc 


gccggcgcgc 


atgggtgaga 


3540 


ttccttgaag 


ttgagtattg 


gccgtccgct 


ctaccgaaag 


ttaegggcac 


cattcaaccc 


3600 


ggtccagcac 


ggcggccggg 


taaccgactt 


gctgccccga 


gaattatgea 


gcattttttt 


3660 



Page 28 of 33 



WO 2005/049842 



PCT/EP2004/013122 



ggtgtatgtg 


ggccctgtac 


agcggccgcg 


ttaacgcgta 


tactctagta 


tgcaccatac 


3720 


atggagtcaa 


aaattcagat 


cgaggatcta 


acagaactcg 


ccgtgaagac 


tggcgaacag 


3780 


ttcatacaga 


gtcttttacg 


actcaatgac 


aagaagaaaa 


tcttcgtcaa 


catggtggag 


3840 


cacgacactc 


tcgtctactc 


caagaatatc 


aaagatacag 


tctcagaaga 


ccaaagggct 


3900 


attgagactt 


ttcaacaaag 


ggtaatatcg 


ggaaacctcc 


tcggattcca 


ttgcccagct 


3960 


atctgtcact 


tcatcaaaag 


gacagtagaa 


aaggaaggtg 


gcacctacaa 


atgccatcat 


4020 


tgcgataaag 


gaaaggctat 


cgttcaagat 


gcctctgccg 


acagtggtcc 


caaagatgga 


4080 


cccccaccca 


cgaggagcat 


cgtggaaaaa 


gaagacgttc 


caaccacgtc 


ttcaaagcaa 


4140 


gtggattgat 


gtgatatctc 


cactgacgta 


agggatgacg 


cacaatccca 


ctatccttcg 


4200 


caagaccctt 


cctctatata 


aggaagttca 


tttcatttgg 


agaggactcg 


agaattaagc 


4260 


aaaagaagaa 


gaagaagaag 


tccaaaacca 


tggctaaacc 


ccccaagaag 


aagcgcaagg 


4320 


ttaacatcaa 


aaaaaaccag 


gtaatgaacc 


tgggtccgaa 


ctctaaactg 


ctgaaagaat 


4380 


acaaatccca 


gctgatcgaa 


ctgaacatcg 


aacagttcga 


agcaggtatc 


ggtctgatcc 


4440 


tgggtgatgc 


ttacatccgt 


tctcgtgatg 


aaggtaaaac 


ctactgtatg 


cagttcgagt 


4500 


ggaaaaacaa 


agcatacatg 


gaccacgtat 


gtctgctgta 


cgatcagtgg 


gtactgtccc 


4560 


cgccgcacaa 


aaaagaacgt 


gttaaccacc 


tgggtaacct 


ggtaatcacc 


tggggcgccc 


4620 


agactttcaa 


acaccaagct 


ttcaacaaac 


tggctaacct 


gttcatcgtt 


aacaacaaaa 


4680 


aaaccatccc 


gaacaacctg 


gttgaaaact 


acctgacccc 


gatgtctctg 


gcatactggt 


4740 


tcatggatga 


tggtggtaaa 


tgggattaca 


acaaaaactc 


taccaacaaa 


gtattgtact 


4800 


gaacacccag 


tctttcactt 


tcgaagaagt 


agaatacctg 


gttaagggtc 


tgcgtaacaa 


4860 


attccaactg 


aactgttacg 


taaaaatcaa 


caaaaacaaa 


ccgatcatct 


acatcgattc 


4920 


tatgtcttac 


ctgatcttct 


acaacctgat 


caaaccgtac 


ctgatcccgc 


agatgatgta 


4980 


caaactgccg 


aacactatct 


cctccgaaac 


tttcctgaaa 


tagggctagc 


aagcttggac 


5040 


acgctgaaat 


caccagtctc 


tctctacaaa 


tctatctctc 


tctattttct 


ccataataat 


5100 


gtgtgagtag 


ttcccagata 


agggaattag 


ggttcctata 


gggtttcgct 


cacgtiguuga 


fin 


gcatataaga 


aacccttagt 




— > -f— -f— +- rr+" aaaa 
d L. U *- y L-dadcl 


t-srttctatc 


aataaaattt 


5220 


ctaattccta 


aaaccaaaat 


ccagtactaa 


aatccagatc 


atgcatggta 


cagcggccgc 


5280 


gttaacgcgt 


atactctaga 


gcgatcgcca 


tggagccatt 


tacaattgaa 


tatatcctgc 


5340 


cgccg 












5345 








Page 29 of 


33 







WO 2005/049842 



PCT/EP2004/013122 



<210> 
<211> 
<212> 



7 

4066 
DNA 



<213> Artificial 
<220> 

<223> pCV78 
<220> 

<221> misc_feature 

<222> (234) . . (763) 

<223> P35S2 promoter 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(764) . . (805) 
Atslb' 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(808) . - (839) 

nuclear localization signal 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(840) . . (1541) 
I-Scel synthetic 



<220> 
<221> 
<222> 
<223> 



misc_f eature 
(1544) . . (1792) 
3' 35S 



<220> 

<221> mi sc_f eature 

<222> (3006) . . (3886) 

<223> Ampicillin resistance (complement) 



<400> 7 
tcgcgcgttt 


cggtgatgac 


ggtgaaaacc 


tctgacacat gcagctcccg gagacggtca 


60 


cagcttgtct 


gtaagcggat 


gccgggagca 


gacaagcccg 


tcagggcgcg 


tcagcgggtg 


120 


ttggcgggtg 


tcggggctgg 


cttaactatg 


cggcatcaga 


gcagattgta 


ctgagagtgc 


180 


accatacctg 


caggcaattg 


gtacctacgt 


atgcatggcg 


cgccatatgc 


accatacatg 


240" 


gagtcaaaaa 


ttcagatcga 


ggatctaaca 


gaactcgccg 


tgaagactgg 


cgaacagttc 


300 



Page 30 of 33 



WO 2005/049842 PCT/EP2004/013122 



atacagagtc 


ttttacgact 


caatgacaag 


aagaaaatct 


tcgtcaacat 


ggtggagcac 


360 


gacactctcg 


tctactccaa 


gaatatcaaa 


gatacagtct 


cagaagacca 


aagggctatt 


420 


gagacttttc 


aacaaagggt 


aatatcggga 


aacctcctcg 


gattccattg 


cccagctatc 


480 


tgtcacttca 


tcaaaaggac 


agtagaaaag 


gaaggtggca 


cctacaaatg 


ccatcattgc 


540 


gataaaggaa 


aggctatcgt 


tcaagatgcc 


tctgccgaca 


gtggtcccaa 


agatggaccc 


600 


ccacccacga 


ggagcatcgt 


ggaaaaagaa 


gacgttccaa 


ccacgtcttc 


aaagcaagtg 


660 


gattgatgtg 


atatctccac 


tgacgtaagg 


gatgacgcac 


aatcccacta 


tccttcgcaa 


720 


gacccttcct 


ctatataagg 


aagttcattt 


catttggaga 


ggactcgaga 


attaagcaaa 


780 


agaagaagaa 


gaagaagtcc 


aaaaccatgg 


ccaagcctcc 


caagaagaag 


cgcaaagtga 


840 


acatcaagaa 


gaaccaggtg 


atgaacctgg 


gacctaacag 


caagctcctg 


aaggagtaca 


900 


agagccagct 


gatcgaactg 


aacatcgagc 


agttcgaagc 


tggcatcggc 


ctgatcctgg 


960 


gcgatgccta 


catcagatcc 


cgggacgaag 


gcaagaccta 


ctgcatgcag 


ttcgagtgga 


1020 


agaacaaggc 


ctacatggac 


cacgtgtgtc 


tgctgtacga 


ccagtgggtc 


ctgagccctc 


1080 


ctcacaagaa 


ggagcgcgtg 


aaccatctgg 


gcaacctcgt 


gatcacctgg 


ggagcccaga 


1140 


ccttcaagca 


ccaggccttc 


aacaagctgg 


ccaacctgtt 


catcgtgaac 


aacaagaaga 


1200 


ccatccccaa 


caacctcgtg 


gagaactacc 


tcactcccat 


gagcctggcc 


tactggttca 


1260 


tggacgacgg 


aggcaagtgg 


gactacaaca 


agaacagcac 


caacaagtca 


attgtgctga 


1320 


acacccaaag 


cttcaccttc 


gaagaagtgg 


agtacctcgt 


caagggcctg 


cgcaacaagt 


1380 


tccagctgaa 


ctgctacgtg 


aagatcaaca 


agaacaagcc 


tatcatctac 


atcgacagca 


1440 


tgagctacct 


gatcttctac 


aacctgatca 


agccatacct 


gatccctcag 


atgatgtaca 


1500 


agctgcccaa 


caccatcagc 


agcgagacct 


tcctgaagtg 


aggctagcaa 


gcttggacac 


1560 


gctgaaatca 


ccagtctctc 


tctacaaatc 


tatctctctc 


tattttctcc 


ataataatgt 


1620 


gtgagtagtt 


cccagataag 


ggaattaggg 


ttcctatagg 


gtttcgctca 


tgtgttgagc 


1680 


atataagaaa 


cccttagtat 


gtatttgtat 


ttgtaaaata 


cttctatcaa 


taaaatttct 


1740 


aattcctaaa 


accaaaatcc 


agtactaaaa 


tccagatcat 


gcatggtaca 


gcggccgcgt 


1800 


taacgcgtat 


actctagagc 


gatcgcaagc 


ttggcgtaat 


catggtcata 


gctgtttcct 


1860 


gtgtgaaatt 


gttatccgct 


cacaattcca 


cacaacatac 


gagccggaag 


cataaagtgt 


1920 


aaagcctggg 


gtgcctaatg 


agtgagctaa 


ctcacattaa 


ttgcgttgcg 


ctcactgccc 


1980 


gctttccagt 


cgggaaacct 


gtcgtgccag 


ctgcattaat 


gaatcggcca 


acgcgcgggg 


2040 



Page 31 of 33 



WO 2005/049842 



PCT/EP2004/013122 



agaggcggtt tgcgtattgg gcgctcttcc gcttcctcgc tcactgactc gctgcgctcg 
gtcgttcggc tgcggcgagc ggtatcagct cactcaaagg cggtaatacg gttatccaca 
gaatcagggg ataacgcagg aaagaacatg tgagcaaaag gccagcaaaa ggccaggaac 
cgtaaaaagg ccgcgttgct ggcgtttttc cataggctcc gcccccctga cgagcatcac 
aaaaatcgac gctcaagtca gaggtggcga aacccgacag gactataaag ataccaggcg 
tttccccctg gaagctccct cgtgcgctct cctgttccga ccctgccgct taccggatac 
ctgtccgcct ttctcccttc gggaagcgtg gcgctttctc aaagctcacg ctgtaggtat 
ctcagttcgg tgtaggtcgt tcgctccaag ctgggctgtg tgcacgaacc ccccgttcag 
cccgaccgct gcgccttatc cggtaactat cgtcttgagt ccaacccggt aagacacgac 
ttatcgccac tggcagcagc cactggtaac aggattagca gagcgaggta tgtaggcggt 
gctacagagt tcttgaagtg gtggcctaac tacggctaca ctagaagaac agtatttggt 
atctgcgctc tgctgaagcc agttaccttc ggaaaaagag ttggtagctc ttgatccggc 
aaacaaacca ccgctggtag cggtggtttt tttgtttgca agcagcagat tacgcgcaga 
aaaaaaggat ctcaagaaga tcctttgatc ttttctacgg ggtctgacgc tcagtggaac 
gaaaactcac gttaagggat tttggtcatg agattatcaa aaaggatctt cacctagatc 
cttttaaatt aaaaatgaag ttttaaatca atctaaagta tatatgagta aacttggtct 
gacagttacc aatgcttaat cagtgaggca cctatctcag cgatctgtct atttcgttca 
tccatagttg cctgactccc cgtcgtgtag ataactacga tacgggaggg cttaccatct 
ggccccagtg ctgcaatgat accgcgagac ccacgctcac cggctccaga tttatcagca 
ataaaccagc cagccggaag ggccgagcgc agaagtggtc ctgcaacttt atccgcctcc 
atccagtcta ttaattgttg ccgggaagct agagtaagta gttcgccagt taatagtttg 
cgcaacgttg ttgccattgc tacaggcatc gtggtgtcac gctcgtcgtt tggtatggct 
tcattcagct ccggttccca acgatcaagg cgagttacat gatcccccat gttgtgcaaa 
aaagcggtta gctccttcgg tcctccgatc gttgtcagaa gtaagttggc cgcagtgtta 
tcactcatgg ttatggcagc actgcataat tctcttactg tcatgccatc cgtaagatgc 
ttttctgtga ctggtgagta ctcaaccaag tcattctgag aatagtg-tat .gcggcgaccg 
agttgctctt gcccggcgtc aatacgggat aataccgcgc cacatagcag aactttaaaa 
gtgctcatca ttggaaaacg ttcttcgggg cgaaaactct caaggatctt accgctgttg 
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agatccagtt cgatgtaacc cactcgtgca cccaactgat cttcagcatc ttttactttc 3780 

accagcgttt ctgggtgagc aaaaacagga aggcaaaatg ccgcaaaaaa gggaataagg 3840 

gcgacacgga aatgttgaat actcatactc ttcctttttc aatattattg aagcatttat 3900 

cagggttatt gtctcatgag cggatacata tttgaatgta tttagaaaaa taaacaaata 3960 

ggggttccgc gcacatttcc ccgaaaagtg ccacctgacg tctaagaaac cattattatc 4020 

atgacattaa cctataaaaa taggcgtatc acgaggccct ttcgtc 4066 
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